Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital...

34
Estimation of a Panel Data Sample Selection Model Ekaterini Kyriazidou Econometrica, Vol. 65, No. 6. (Nov., 1997), pp. 1335-1364. Stable URL: http://links.jstor.org/sici?sici=0012-9682%28199711%2965%3A6%3C1335%3AEOAPDS%3E2.0.CO%3B2-B Econometrica is currently published by The Econometric Society. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/econosoc.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers, and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take advantage of advances in technology. For more information regarding JSTOR, please contact [email protected]. http://www.jstor.org Tue Aug 14 12:23:18 2007

Transcript of Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital...

Page 1: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

Estimation of a Panel Data Sample Selection Model

Ekaterini Kyriazidou

Econometrica Vol 65 No 6 (Nov 1997) pp 1335-1364

Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

Econometrica is currently published by The Econometric Society

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalseconosochtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academicjournals and scholarly literature from around the world The Archive is supported by libraries scholarly societies publishersand foundations It is an initiative of JSTOR a not-for-profit organization with a mission to help the scholarly community takeadvantage of advances in technology For more information regarding JSTOR please contact supportjstororg

httpwwwjstororgTue Aug 14 122318 2007

EconornetticaVol 65 No 6 (November 19971 1335-1364

ESTIMATION O F A PANEL DATA SAMPLE SELECTION MODEL

We consider the problem of estimation in a panel data samplc selection model where both thc selection and the regression equation of intercst contain unobservable individ- ual-specific effects We propose a two-step estimation procedure which differences out both the sample selection effect and the unobservable individual effect from the cquation of intercst In the first step the unknown coefficients of the selection equation are consistently estimated The estimates are then used to estimate thc regression equation of interest The estimator proposed in this paper is consistent and asymptotically normal with a rate of convergence that can be made arbitrarily close to n- I2 depending on the strength of certain smoothness assumptions The finite sample properties of the estimator are invcstigated in a small Monte Carlo simulation

KEYWORDSSample selection panel data individual-specific effects

1 INTRODUCTION

SAMPLESELECTION IS A PROBLEM frequently encountered in applied research It arises as a result of either self-selection by the individuals under investigation or sample selection decisions made by data analysts A classic example studied in the seminal work of Gronau (1974) and Heckman (1976) is female labor supply where hours worked are observed only for those women who decide to participate in the labor force Failure to account for sample selection is well known to lead to inconsistent estimation of the behavioral parameters of interest as these are confounded with parameters that determine the probability of entry into the sample In recent years a vast amount of econometric literature has been devoted to the problem of controlling for sample selectivity The research however has almost exclusively focused on the cross-sectional data case See Powell (1994) for a review of this literature and for references In contrast this paper focuses on the case where the researcher has panel or longitudinal data a~a i l ab l e ~ Sample selectivity is as acute a problem in panel as in cross section data In addition panel data sets are commonly characterized by nonrandomly missing observations due to sample attrition

This paper is bascd on Chapter 1 of my thesis completed at Northwestern University Evanston Illinois I wish to thank my thesis advisor Bo Honoramp for invaluable help and support during this project Many individuals among them a co-editor and two anonymous referecs have offered useful comments and suggestions for which I am very grateful Joel Horowitz kindly provided a computer program used in this study An earlicr version of the paper was prescnted at the North American Summer Meetings of the Econometric Society June 1994 Financial support from NSF through Grant No SES-9210037 to Bo Honoramp is gratefully acknowledged All remaining errors are my responsibility An Appendix which contains a proof of a theorem not included in the paper may be obtained at the world wide web site httpwwwspcuchicagoeduE-Kyriazidou

Obviously the analysis is similar for any kind of data that have a group structure

1336 EKATERINI KYRIAZIDOU

The most typical concern in empirical work using panel data has been the presence of unobserved heterogeneity Heterogeneity across economic agents may arise for example as a result of different preferences endowments or attributes These permanent individual characteristics are commonly unobserv- able or may simply not be measurable due to their qualitative nature Failure to account for such individual-specific effects may result in biased and inconsistent estimates of the parameters of interest In linear panel data models these unobserved effects may be differenced out using the familiar within (fixed-effects) approach This method is generally not applicable in limited dependent variable models Exceptions include the discrete choice model stud- ied by Rasch (1960 1961) Anderson (1970) and Manski (1987) and the censored and truncated regression models (Honor6 (1992 1993)) See also Chamberlain (1984) and Hsiao (1986) for a discussion of panel data methods

The simultaneous presence of sample selectivity and unobserved heterogene- ity has been noted in empirical work (as for example in Hausman and Wise (19791 Nijman and Verbeek (1992) and Rosholm and Smith (1994)) Given the pervasiveness of either problem in panel data studies it appears highly desirable to be able to control for both of them simultaneously The present paper is a step in this direction

In particular we consider the problem of estimating a panel data model where both the sample selection rule assumed to follow a binary response model and the (linear) regression equation of interest contain additive perma- nent unobservable individual-specific effects that may depend on the observable explanatory variables in an arbitrary way In this type 2 Tobit model (in the terminology of Amemiya (1985)) sample selectivity induces a fundamental nonlinearity in the equation of interest with respect to the unobserved charac- teristics which in contrast to linear panel data models cannot be differenced away This is because the sample selection effect which enters additivelp in the main equation is a (generally unknown) nonlinear function of both the observed time-varying regressors and the unobservable individual effects of the selection equation and is therefore not constant over time

Furthermore even if one were willing to specify the distribution of the underlying time-varying errors (for example normal) in order to estimate the model by maximum likelihood the presence of unobservable effects in the selection rule would require that the researcher also specify a functional form for their statistical dependence on the observed variables Apart from being nonrobust to distributional misspecification this fully parametric random ef- fects approach is also computationally cumbersome as it requires multiple numerical integration over both the unobservable effects and the entire length of the panel Heckmans (1976 1979) two-step correction although computa- tionally much more tractable also requires full specification of the underlying distributions of the unobservables and is therefore susceptible to inconsisten- cies due to misspecification Thus the results of this paper will be important even if the distribution of the individual effects is the only nuisance parameter in the model

SAMPLE SELECTION MODEL 1337

Panel data selection models with latent individual effects have been most recently considered by Verbeek and Nijman (19921 and Wooldridge (19951 who proposed methods for testing and correcting for selectivity bias A crucial assumption underlying these methods is the parameterization of the sample selection mechanism Specifically these authors assume that both the unobsew- able effect and the idiosyncratic errors in the selection process are normally distributed The present paper is an important departure from this work in the sense that the distributions of all unobservables are left unspecified

We focus on the case where the data consist of a large number of individuals observed through a small number of time periods and analyze asymptotics as the number of individuals (n) approaches infinity Short-length panels are not only the most relevant for practical purposes they also pose problems in estimation In such cases even if the individual effects are treated as parameters to be estimated a parametric maximum likelihood approach yields inconsistent estimates the well known incidental parameters problem

Our method for estimating the main regression equation of interest follows the familiar two-step approach proposed by Heckman (1974 1976) for paramet- ric selection models which has been used in the construction of most semipara- metric estimators for such models In the first step the unknown coefficients of the selection equation are consistently estimated In the second step these estimates are used to estimate the equation of interest by a weighted least squares regression The fixed effect from the main equation is eliminated by taking time differences on the observed selected variables while the first-step estimates are used to construct weights whose magnitude depends on the magnitude of the sample selection bias For a fixed sample size observations with less selectivity bias are given more weight while asymptotically only those observations with zero bias are used This idea has been used by Powell (19871 and Ahn and Powell (1993) for the estimation of cross sectional selection models The intuition is that for an individual that is selected into the sample in two time periods it is reasonable to assume that the magnitude of the selection effect in the main equation will be the same if the observed variables determin- ing selection remain constant over time Therefore time differencing the outcome equation will eliminate not only its unobservable individual effect but also the sample selection effect In fact by imposing a linear regression structure on the latent model underlying the selection mechanism the above argument will also hold if only the linear combination of the observed selection covariates known up to a finite number of estimable parameters remains constant over time Under appropriate assumptions on the rate of convergence of the first step estimator the proposed estimator of the main equation of interest is shown to be consistent and asymptotically normal with a rate of convergence that can be made arbitrarily close to n- I2 In particular by assuming that the selection equation is estimated at a faster rate than the main equation we obtain a limiting distribution which does not depend on the distribution of the first step estimator

1338 EKATERINI KYRIAZIDOU

The first step of the proposed estimation method requires that the discrete choice selection equation be estimated consistently and at a sufficiently fast rate To this end we propose using a smoothed version of Manskis (1987) condi- tional maximum score e~ t ima to r ~ which follows the approach taken by Horowitz (1992) for estimating cross section discrete choice models Under appropriate assumptions stronger than those in Manski (1987) the smoothed estimator improves on the rate of convergence of the original estimator and also allows standard statistical inference Furthermore it dispenses with parametric as-sumptions on the distribution of the errors required for example by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970)

Although our analysis is based on the assumption of a censored panel with only two observations per individual it easily generalizes to the case of a longer and possiblyunbalanced panel and may be also modified to accommodate truncated samples in which case estimation of the selection equation is infeasi- ble Extensions of our estimation method to cover these situations are discussed at the end of the next section

The paper is organized as follows Section 2 describes the model and moti- vates the proposed estimation procedure Section 3 states the assumptions and derives the asymptotic properties of the estimator Section 4 presents the results of a Monte Carlo study investigating the small sample performance of the proposed estimator Section 5 offers conclusions and suggests topics for future research The proofs of theorems and lemmata are given in the Appendix

2 THE MODEL AND THE PROPOSED ESTIMATOR

We consider the following model

(22) d = lwity+ 17 - uit 2 01

Here p E F t k and y E 8 4 are unknown parameter vectors which we wish to e ~ t i m a t e ~ and wi are vectors of explanatory variables (with possibly common x elements) agtnd 17 are unobservable time-invariant individual-specific effects5 (possibly correlated with the regressors and the errors) ET and uit are unob- served disturbances (not necessarily independent of each other) while yz E 3 is a latent variable whose observability depends on the outcome of the indicator

The smoothed conditional maximum score estimator for binary response panel data models along with its asymptotic properties and necessary assumptions is presented in an earlier version of this paper (Kyriazidou (1994)) See also Charlier Melenberg and van Soest (1995)

Obviously constants cannot be identified in either equation since they would be absorbed in the individual effects

These will be treated as nuisance parameters and will not be estimated Our analysis also applies to the case where a = rl

SAMPLE SELECTION MODEL 1339

variable d E Ol) In particular it is assumed that while ( d ~ ) is always observed (y x) is observed only6 if d = 1 In other words the selection variable d determines whether the itth observation in equation (21) is cen- sored or not Thus our problem is to estimate P and y from a sample consisting of quadruples (dilwiyixi) We will denote the vector of (observed and unobserved) explanatory variables by ii= (wil w x x a q)Notice that without the fixed effects a and rl our model becomes a panel data version of the well known sample selection model considered in the literature and could be estimated by any of the existing methods Without sample selectivity that is with d = 1 for all i and t equation (21) is the standard panel data linear regression model

In our setup it is possible to estimate y in the discrete choice selection equation (22) using either the conditional maximum likelihood approach pro- posed by Rasch (1960 1961) and Andersen (1970) or the conditional maximum score method proposed by Manski (1987) On the other hand estimation of P based on the main equation of interest (21) is confronted with two problems first the presence of the unobservable effect ai=d a and second and more fundamental the potential endogeneity of the regressors xi = dix which arises from their dependence on the selection variable d and which may result in selection bias

The first problem is easily solved by noting that for those observations that have d =d = 1 time differencing will eliminate the effect a from equation (21) This is analogous to the fixed-effects approach taken in linear panel data models In general though application of standard methods eg OLS on this first-differenced subsample will yield inconsistent estimates of P due to sample selectivity This may be seen from the population regression function for the first-differenced subsample

E(y i l -y i2 Id i l=1 d i2=1 l i )

= (x~ - 4 ) p + E ( E ~- ampIdil = 1d i2= 1 i i )

In general there is no reason to expect that E(ampT Id = 1 d = 1 l i ) = 0 or that E ( E ~ Idil = 1di2= 1 i) =E(e2ldil = 1d = I amp) In particular for each time period the sample selection effect A=E(E Idil = 1 d = 1 i i ) depends not only on the (partially unobservable) conditioning vector iibut also on the (generally unknown) joint conditional distribution of (e u u) which may differ across individuals as well as over time for the same individual

A =E(ampldil = 1d i2= 1 i )

=E(sIluil I W Y + 7 u i 2 4 w i 2 y + v i l i )

= A(wily+ ~ i ~ i 2 ~ + q i F (ampT~i l ~ i2I i i ) )

= A i l ( w i l ~+ 77wi2~+ 7h l i)

Obviously the analysis carries through to the case where x is always observed which is the case most commonly treated in the literature

1340 EKATERINI KYRIAZIDOU

It is convenient to rewrite the main equation (21) as a partially linear regression

where ui = s- A is a new error term which by collstruction satisfies E(uld = 1 di2 = 1Ji) = 0 The idea of our scheme for estimating is to difference out the nuisance terms ai and A from the equation above

As a motivation of our estimation procedure consider the case where (s u) is independent and identically distributed over time and across individuals and is independent of J Under these assumptions it is easy to see that

where A() is an unknown function the same over time and across individuals of the single index wily + 7 Obviously in general hi A unless wily = wi2 y In other words for an individual i that has wily = wi2 y and d =d = 1 the sample selection effect A will be the same in the two periods Thus for this particular individual applying first-differences in equation (21) will eliminate both the unobservable effect a and the selection effect hi At this point it is important to notice that even if the functional form of A were known (as for example in the case of a bivariate normal distribution-see Heckman (197611 it would still involve the unobservable effect rl This suggests that it would be generally infeasible to consistently estimate P from (21) even in the absence of the effect a and with knowledge of y unless a parametric form for the distribution of qi conditional on the observed exogenous variables were also specified

The preceding argument for differencing out both nuisance terms from equation (21) will hold under much weaker distributional assumptions In particular since first-differences are taken on an individual basis it is not required that ( s z ui) be iid across individuals nor that it be independent of the individual-specific vector amp In other words we may allow the functional form of 11 to vary across individuals It is also possible to allow for serial correlation in the errors Consider for example the case where (E 82uil ui2) and (E E LL uil) are identically distributed conditional on J ie F(s E

uil ui21 lj)=F(s2 E ui2 uil 1 f) Under this conditional exchangeability assump- tion it is easy to see that for an individual i that has wily = wi2 y

Notice that in general it is not sufficient to assume joint conditional stationarity of the errors An extreme example is the case where 82 E and ui are iid N(0l) and independent of Liwhile ui2 = 8 Then A =E(s2 1s 5 wiZy+ rl) Ai2 =E(sg) regardless of whether wily = wi2 y

SAMPLE SELECTION MODEL 1341

The above discussion which presumes knowledge of the true y suggests estimating p by OLS from a subsample that consists of those observations that have wily = w y and d = d = 1 Defining Ti= lwily = wi2 y Qi = ldil =

d = I = didi2 and with A denoting first differences the OLS estimator is of the form jn = [Cy= Ax Axi I- [Cy= Ax Ay TiQi] Under appropriate reg- ularity conditions this estimator will be consistent and root-n asymptotically normal An obvious requirement is that Pr(Awi y = 0) gt 0 which may be satis- fied for example when all the random variables in wit are discrete or in experimental cases where the distribution of wit is in the control of the researcher situations that are rare in economic applications

Of course this estimation scheme cannot be directly implemented since y is unknown Furthermore as argued above it may be the case that Ti= 0 6e Aw y 0) for all individuals in our sample Notice though that if A is a sufficiently smooth function and i is a consistent estimate of y observations for which the difference Aw is close to zero should also have AA E 0 and the preceding arguments would hold approximately

We therefore propose the following two-step estimation procedure which is in the spirit of Powell (1987) and Ahn and Powell (1993) In the first step y is consistently estimated based on equation (22) alone In the second step the estimate yn is used to estimate p based on those pairs of observations for which wiqn and wiTn are close Specifically we propose

where amp is a weight that declines to zero as the magnitude of the difference I wiqn -wi2YnI increases We choose kernel weights of the form

where K is a kernel density function and h is a sequence of bandwidths which tends to zero as n + m Thus for a fixed (nonzero) magnitude of the difference 1 Aw I the weight Ginshrinks as the sample size increases while for a fixed n a larger I Aw I corresponds to a smaller weight

It is interesting to note that the arguments used in estimating the main regression equation may be modified to accommodate the case of a truncated sample that is when we only observe those individuals that have d = 1 for all time periods Recall that our method for eliminating the sample selection effect from equation (21) is based on the fact that under certain distributional assumptions Aw y = 0 implies Ah = 0 However Aw = 0 also implies Ah = 0 In other words we might dispense altogether with the first step of estimating y and estimate p from those observations for which wil and wi2 are close which would suggest using the weights Gin = (lh)K(Awh) Although this ap- proach would imply a slower rate of convergence for the resulting estimator this

1342 EKATERINI KYRIAZIDOU

estimation scheme may be used for estimating p from a truncated sample in which case estimation of the selection equation is infeasible An obvious drawback in this method is that in order to consistently estimate the entire parameter vector p we would have to impose the restriction that wit and xY do not contain any elements in common

The above analysis extends naturally to the case of a longer (and possibly unbalanced) panel that is when T2 2 Then p could be estimated from those observations that have d = d = 1 and for which wit and wis are close for all s t = 1 qThe estimator is of the form

where

In the following section we derive the asymptotic properties of our proposed estimator for the main equation of interest under the assumption that y has been consistently estimated At the end of the section we examine the applica- bility of existing estimators for obtaining first-step estimates of the selection equation

3 ESTIMATION OF THE MAIN EQUATION

31 Asymptotic Properties of the Estimator

The derivation of the large sample properties of fin of equations (23) and (24) proceeds in two steps First the asymptotic behavior of the infeasible estimator which uses the true y in the construction of the kernel weights denoted by fin is analyzed Then the large sample behavior of the difference ( fin - fin) is investigated

It will be useful to define the scalar index W= Aw y and its estimated counterpart = Aw y along with the following quantities

j= - C -K - Ax Axi n =1 h

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 2: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

EconornetticaVol 65 No 6 (November 19971 1335-1364

ESTIMATION O F A PANEL DATA SAMPLE SELECTION MODEL

We consider the problem of estimation in a panel data samplc selection model where both thc selection and the regression equation of intercst contain unobservable individ- ual-specific effects We propose a two-step estimation procedure which differences out both the sample selection effect and the unobservable individual effect from the cquation of intercst In the first step the unknown coefficients of the selection equation are consistently estimated The estimates are then used to estimate thc regression equation of interest The estimator proposed in this paper is consistent and asymptotically normal with a rate of convergence that can be made arbitrarily close to n- I2 depending on the strength of certain smoothness assumptions The finite sample properties of the estimator are invcstigated in a small Monte Carlo simulation

KEYWORDSSample selection panel data individual-specific effects

1 INTRODUCTION

SAMPLESELECTION IS A PROBLEM frequently encountered in applied research It arises as a result of either self-selection by the individuals under investigation or sample selection decisions made by data analysts A classic example studied in the seminal work of Gronau (1974) and Heckman (1976) is female labor supply where hours worked are observed only for those women who decide to participate in the labor force Failure to account for sample selection is well known to lead to inconsistent estimation of the behavioral parameters of interest as these are confounded with parameters that determine the probability of entry into the sample In recent years a vast amount of econometric literature has been devoted to the problem of controlling for sample selectivity The research however has almost exclusively focused on the cross-sectional data case See Powell (1994) for a review of this literature and for references In contrast this paper focuses on the case where the researcher has panel or longitudinal data a~a i l ab l e ~ Sample selectivity is as acute a problem in panel as in cross section data In addition panel data sets are commonly characterized by nonrandomly missing observations due to sample attrition

This paper is bascd on Chapter 1 of my thesis completed at Northwestern University Evanston Illinois I wish to thank my thesis advisor Bo Honoramp for invaluable help and support during this project Many individuals among them a co-editor and two anonymous referecs have offered useful comments and suggestions for which I am very grateful Joel Horowitz kindly provided a computer program used in this study An earlicr version of the paper was prescnted at the North American Summer Meetings of the Econometric Society June 1994 Financial support from NSF through Grant No SES-9210037 to Bo Honoramp is gratefully acknowledged All remaining errors are my responsibility An Appendix which contains a proof of a theorem not included in the paper may be obtained at the world wide web site httpwwwspcuchicagoeduE-Kyriazidou

Obviously the analysis is similar for any kind of data that have a group structure

1336 EKATERINI KYRIAZIDOU

The most typical concern in empirical work using panel data has been the presence of unobserved heterogeneity Heterogeneity across economic agents may arise for example as a result of different preferences endowments or attributes These permanent individual characteristics are commonly unobserv- able or may simply not be measurable due to their qualitative nature Failure to account for such individual-specific effects may result in biased and inconsistent estimates of the parameters of interest In linear panel data models these unobserved effects may be differenced out using the familiar within (fixed-effects) approach This method is generally not applicable in limited dependent variable models Exceptions include the discrete choice model stud- ied by Rasch (1960 1961) Anderson (1970) and Manski (1987) and the censored and truncated regression models (Honor6 (1992 1993)) See also Chamberlain (1984) and Hsiao (1986) for a discussion of panel data methods

The simultaneous presence of sample selectivity and unobserved heterogene- ity has been noted in empirical work (as for example in Hausman and Wise (19791 Nijman and Verbeek (1992) and Rosholm and Smith (1994)) Given the pervasiveness of either problem in panel data studies it appears highly desirable to be able to control for both of them simultaneously The present paper is a step in this direction

In particular we consider the problem of estimating a panel data model where both the sample selection rule assumed to follow a binary response model and the (linear) regression equation of interest contain additive perma- nent unobservable individual-specific effects that may depend on the observable explanatory variables in an arbitrary way In this type 2 Tobit model (in the terminology of Amemiya (1985)) sample selectivity induces a fundamental nonlinearity in the equation of interest with respect to the unobserved charac- teristics which in contrast to linear panel data models cannot be differenced away This is because the sample selection effect which enters additivelp in the main equation is a (generally unknown) nonlinear function of both the observed time-varying regressors and the unobservable individual effects of the selection equation and is therefore not constant over time

Furthermore even if one were willing to specify the distribution of the underlying time-varying errors (for example normal) in order to estimate the model by maximum likelihood the presence of unobservable effects in the selection rule would require that the researcher also specify a functional form for their statistical dependence on the observed variables Apart from being nonrobust to distributional misspecification this fully parametric random ef- fects approach is also computationally cumbersome as it requires multiple numerical integration over both the unobservable effects and the entire length of the panel Heckmans (1976 1979) two-step correction although computa- tionally much more tractable also requires full specification of the underlying distributions of the unobservables and is therefore susceptible to inconsisten- cies due to misspecification Thus the results of this paper will be important even if the distribution of the individual effects is the only nuisance parameter in the model

SAMPLE SELECTION MODEL 1337

Panel data selection models with latent individual effects have been most recently considered by Verbeek and Nijman (19921 and Wooldridge (19951 who proposed methods for testing and correcting for selectivity bias A crucial assumption underlying these methods is the parameterization of the sample selection mechanism Specifically these authors assume that both the unobsew- able effect and the idiosyncratic errors in the selection process are normally distributed The present paper is an important departure from this work in the sense that the distributions of all unobservables are left unspecified

We focus on the case where the data consist of a large number of individuals observed through a small number of time periods and analyze asymptotics as the number of individuals (n) approaches infinity Short-length panels are not only the most relevant for practical purposes they also pose problems in estimation In such cases even if the individual effects are treated as parameters to be estimated a parametric maximum likelihood approach yields inconsistent estimates the well known incidental parameters problem

Our method for estimating the main regression equation of interest follows the familiar two-step approach proposed by Heckman (1974 1976) for paramet- ric selection models which has been used in the construction of most semipara- metric estimators for such models In the first step the unknown coefficients of the selection equation are consistently estimated In the second step these estimates are used to estimate the equation of interest by a weighted least squares regression The fixed effect from the main equation is eliminated by taking time differences on the observed selected variables while the first-step estimates are used to construct weights whose magnitude depends on the magnitude of the sample selection bias For a fixed sample size observations with less selectivity bias are given more weight while asymptotically only those observations with zero bias are used This idea has been used by Powell (19871 and Ahn and Powell (1993) for the estimation of cross sectional selection models The intuition is that for an individual that is selected into the sample in two time periods it is reasonable to assume that the magnitude of the selection effect in the main equation will be the same if the observed variables determin- ing selection remain constant over time Therefore time differencing the outcome equation will eliminate not only its unobservable individual effect but also the sample selection effect In fact by imposing a linear regression structure on the latent model underlying the selection mechanism the above argument will also hold if only the linear combination of the observed selection covariates known up to a finite number of estimable parameters remains constant over time Under appropriate assumptions on the rate of convergence of the first step estimator the proposed estimator of the main equation of interest is shown to be consistent and asymptotically normal with a rate of convergence that can be made arbitrarily close to n- I2 In particular by assuming that the selection equation is estimated at a faster rate than the main equation we obtain a limiting distribution which does not depend on the distribution of the first step estimator

1338 EKATERINI KYRIAZIDOU

The first step of the proposed estimation method requires that the discrete choice selection equation be estimated consistently and at a sufficiently fast rate To this end we propose using a smoothed version of Manskis (1987) condi- tional maximum score e~ t ima to r ~ which follows the approach taken by Horowitz (1992) for estimating cross section discrete choice models Under appropriate assumptions stronger than those in Manski (1987) the smoothed estimator improves on the rate of convergence of the original estimator and also allows standard statistical inference Furthermore it dispenses with parametric as-sumptions on the distribution of the errors required for example by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970)

Although our analysis is based on the assumption of a censored panel with only two observations per individual it easily generalizes to the case of a longer and possiblyunbalanced panel and may be also modified to accommodate truncated samples in which case estimation of the selection equation is infeasi- ble Extensions of our estimation method to cover these situations are discussed at the end of the next section

The paper is organized as follows Section 2 describes the model and moti- vates the proposed estimation procedure Section 3 states the assumptions and derives the asymptotic properties of the estimator Section 4 presents the results of a Monte Carlo study investigating the small sample performance of the proposed estimator Section 5 offers conclusions and suggests topics for future research The proofs of theorems and lemmata are given in the Appendix

2 THE MODEL AND THE PROPOSED ESTIMATOR

We consider the following model

(22) d = lwity+ 17 - uit 2 01

Here p E F t k and y E 8 4 are unknown parameter vectors which we wish to e ~ t i m a t e ~ and wi are vectors of explanatory variables (with possibly common x elements) agtnd 17 are unobservable time-invariant individual-specific effects5 (possibly correlated with the regressors and the errors) ET and uit are unob- served disturbances (not necessarily independent of each other) while yz E 3 is a latent variable whose observability depends on the outcome of the indicator

The smoothed conditional maximum score estimator for binary response panel data models along with its asymptotic properties and necessary assumptions is presented in an earlier version of this paper (Kyriazidou (1994)) See also Charlier Melenberg and van Soest (1995)

Obviously constants cannot be identified in either equation since they would be absorbed in the individual effects

These will be treated as nuisance parameters and will not be estimated Our analysis also applies to the case where a = rl

SAMPLE SELECTION MODEL 1339

variable d E Ol) In particular it is assumed that while ( d ~ ) is always observed (y x) is observed only6 if d = 1 In other words the selection variable d determines whether the itth observation in equation (21) is cen- sored or not Thus our problem is to estimate P and y from a sample consisting of quadruples (dilwiyixi) We will denote the vector of (observed and unobserved) explanatory variables by ii= (wil w x x a q)Notice that without the fixed effects a and rl our model becomes a panel data version of the well known sample selection model considered in the literature and could be estimated by any of the existing methods Without sample selectivity that is with d = 1 for all i and t equation (21) is the standard panel data linear regression model

In our setup it is possible to estimate y in the discrete choice selection equation (22) using either the conditional maximum likelihood approach pro- posed by Rasch (1960 1961) and Andersen (1970) or the conditional maximum score method proposed by Manski (1987) On the other hand estimation of P based on the main equation of interest (21) is confronted with two problems first the presence of the unobservable effect ai=d a and second and more fundamental the potential endogeneity of the regressors xi = dix which arises from their dependence on the selection variable d and which may result in selection bias

The first problem is easily solved by noting that for those observations that have d =d = 1 time differencing will eliminate the effect a from equation (21) This is analogous to the fixed-effects approach taken in linear panel data models In general though application of standard methods eg OLS on this first-differenced subsample will yield inconsistent estimates of P due to sample selectivity This may be seen from the population regression function for the first-differenced subsample

E(y i l -y i2 Id i l=1 d i2=1 l i )

= (x~ - 4 ) p + E ( E ~- ampIdil = 1d i2= 1 i i )

In general there is no reason to expect that E(ampT Id = 1 d = 1 l i ) = 0 or that E ( E ~ Idil = 1di2= 1 i) =E(e2ldil = 1d = I amp) In particular for each time period the sample selection effect A=E(E Idil = 1 d = 1 i i ) depends not only on the (partially unobservable) conditioning vector iibut also on the (generally unknown) joint conditional distribution of (e u u) which may differ across individuals as well as over time for the same individual

A =E(ampldil = 1d i2= 1 i )

=E(sIluil I W Y + 7 u i 2 4 w i 2 y + v i l i )

= A(wily+ ~ i ~ i 2 ~ + q i F (ampT~i l ~ i2I i i ) )

= A i l ( w i l ~+ 77wi2~+ 7h l i)

Obviously the analysis carries through to the case where x is always observed which is the case most commonly treated in the literature

1340 EKATERINI KYRIAZIDOU

It is convenient to rewrite the main equation (21) as a partially linear regression

where ui = s- A is a new error term which by collstruction satisfies E(uld = 1 di2 = 1Ji) = 0 The idea of our scheme for estimating is to difference out the nuisance terms ai and A from the equation above

As a motivation of our estimation procedure consider the case where (s u) is independent and identically distributed over time and across individuals and is independent of J Under these assumptions it is easy to see that

where A() is an unknown function the same over time and across individuals of the single index wily + 7 Obviously in general hi A unless wily = wi2 y In other words for an individual i that has wily = wi2 y and d =d = 1 the sample selection effect A will be the same in the two periods Thus for this particular individual applying first-differences in equation (21) will eliminate both the unobservable effect a and the selection effect hi At this point it is important to notice that even if the functional form of A were known (as for example in the case of a bivariate normal distribution-see Heckman (197611 it would still involve the unobservable effect rl This suggests that it would be generally infeasible to consistently estimate P from (21) even in the absence of the effect a and with knowledge of y unless a parametric form for the distribution of qi conditional on the observed exogenous variables were also specified

The preceding argument for differencing out both nuisance terms from equation (21) will hold under much weaker distributional assumptions In particular since first-differences are taken on an individual basis it is not required that ( s z ui) be iid across individuals nor that it be independent of the individual-specific vector amp In other words we may allow the functional form of 11 to vary across individuals It is also possible to allow for serial correlation in the errors Consider for example the case where (E 82uil ui2) and (E E LL uil) are identically distributed conditional on J ie F(s E

uil ui21 lj)=F(s2 E ui2 uil 1 f) Under this conditional exchangeability assump- tion it is easy to see that for an individual i that has wily = wi2 y

Notice that in general it is not sufficient to assume joint conditional stationarity of the errors An extreme example is the case where 82 E and ui are iid N(0l) and independent of Liwhile ui2 = 8 Then A =E(s2 1s 5 wiZy+ rl) Ai2 =E(sg) regardless of whether wily = wi2 y

SAMPLE SELECTION MODEL 1341

The above discussion which presumes knowledge of the true y suggests estimating p by OLS from a subsample that consists of those observations that have wily = w y and d = d = 1 Defining Ti= lwily = wi2 y Qi = ldil =

d = I = didi2 and with A denoting first differences the OLS estimator is of the form jn = [Cy= Ax Axi I- [Cy= Ax Ay TiQi] Under appropriate reg- ularity conditions this estimator will be consistent and root-n asymptotically normal An obvious requirement is that Pr(Awi y = 0) gt 0 which may be satis- fied for example when all the random variables in wit are discrete or in experimental cases where the distribution of wit is in the control of the researcher situations that are rare in economic applications

Of course this estimation scheme cannot be directly implemented since y is unknown Furthermore as argued above it may be the case that Ti= 0 6e Aw y 0) for all individuals in our sample Notice though that if A is a sufficiently smooth function and i is a consistent estimate of y observations for which the difference Aw is close to zero should also have AA E 0 and the preceding arguments would hold approximately

We therefore propose the following two-step estimation procedure which is in the spirit of Powell (1987) and Ahn and Powell (1993) In the first step y is consistently estimated based on equation (22) alone In the second step the estimate yn is used to estimate p based on those pairs of observations for which wiqn and wiTn are close Specifically we propose

where amp is a weight that declines to zero as the magnitude of the difference I wiqn -wi2YnI increases We choose kernel weights of the form

where K is a kernel density function and h is a sequence of bandwidths which tends to zero as n + m Thus for a fixed (nonzero) magnitude of the difference 1 Aw I the weight Ginshrinks as the sample size increases while for a fixed n a larger I Aw I corresponds to a smaller weight

It is interesting to note that the arguments used in estimating the main regression equation may be modified to accommodate the case of a truncated sample that is when we only observe those individuals that have d = 1 for all time periods Recall that our method for eliminating the sample selection effect from equation (21) is based on the fact that under certain distributional assumptions Aw y = 0 implies Ah = 0 However Aw = 0 also implies Ah = 0 In other words we might dispense altogether with the first step of estimating y and estimate p from those observations for which wil and wi2 are close which would suggest using the weights Gin = (lh)K(Awh) Although this ap- proach would imply a slower rate of convergence for the resulting estimator this

1342 EKATERINI KYRIAZIDOU

estimation scheme may be used for estimating p from a truncated sample in which case estimation of the selection equation is infeasible An obvious drawback in this method is that in order to consistently estimate the entire parameter vector p we would have to impose the restriction that wit and xY do not contain any elements in common

The above analysis extends naturally to the case of a longer (and possibly unbalanced) panel that is when T2 2 Then p could be estimated from those observations that have d = d = 1 and for which wit and wis are close for all s t = 1 qThe estimator is of the form

where

In the following section we derive the asymptotic properties of our proposed estimator for the main equation of interest under the assumption that y has been consistently estimated At the end of the section we examine the applica- bility of existing estimators for obtaining first-step estimates of the selection equation

3 ESTIMATION OF THE MAIN EQUATION

31 Asymptotic Properties of the Estimator

The derivation of the large sample properties of fin of equations (23) and (24) proceeds in two steps First the asymptotic behavior of the infeasible estimator which uses the true y in the construction of the kernel weights denoted by fin is analyzed Then the large sample behavior of the difference ( fin - fin) is investigated

It will be useful to define the scalar index W= Aw y and its estimated counterpart = Aw y along with the following quantities

j= - C -K - Ax Axi n =1 h

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 3: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1336 EKATERINI KYRIAZIDOU

The most typical concern in empirical work using panel data has been the presence of unobserved heterogeneity Heterogeneity across economic agents may arise for example as a result of different preferences endowments or attributes These permanent individual characteristics are commonly unobserv- able or may simply not be measurable due to their qualitative nature Failure to account for such individual-specific effects may result in biased and inconsistent estimates of the parameters of interest In linear panel data models these unobserved effects may be differenced out using the familiar within (fixed-effects) approach This method is generally not applicable in limited dependent variable models Exceptions include the discrete choice model stud- ied by Rasch (1960 1961) Anderson (1970) and Manski (1987) and the censored and truncated regression models (Honor6 (1992 1993)) See also Chamberlain (1984) and Hsiao (1986) for a discussion of panel data methods

The simultaneous presence of sample selectivity and unobserved heterogene- ity has been noted in empirical work (as for example in Hausman and Wise (19791 Nijman and Verbeek (1992) and Rosholm and Smith (1994)) Given the pervasiveness of either problem in panel data studies it appears highly desirable to be able to control for both of them simultaneously The present paper is a step in this direction

In particular we consider the problem of estimating a panel data model where both the sample selection rule assumed to follow a binary response model and the (linear) regression equation of interest contain additive perma- nent unobservable individual-specific effects that may depend on the observable explanatory variables in an arbitrary way In this type 2 Tobit model (in the terminology of Amemiya (1985)) sample selectivity induces a fundamental nonlinearity in the equation of interest with respect to the unobserved charac- teristics which in contrast to linear panel data models cannot be differenced away This is because the sample selection effect which enters additivelp in the main equation is a (generally unknown) nonlinear function of both the observed time-varying regressors and the unobservable individual effects of the selection equation and is therefore not constant over time

Furthermore even if one were willing to specify the distribution of the underlying time-varying errors (for example normal) in order to estimate the model by maximum likelihood the presence of unobservable effects in the selection rule would require that the researcher also specify a functional form for their statistical dependence on the observed variables Apart from being nonrobust to distributional misspecification this fully parametric random ef- fects approach is also computationally cumbersome as it requires multiple numerical integration over both the unobservable effects and the entire length of the panel Heckmans (1976 1979) two-step correction although computa- tionally much more tractable also requires full specification of the underlying distributions of the unobservables and is therefore susceptible to inconsisten- cies due to misspecification Thus the results of this paper will be important even if the distribution of the individual effects is the only nuisance parameter in the model

SAMPLE SELECTION MODEL 1337

Panel data selection models with latent individual effects have been most recently considered by Verbeek and Nijman (19921 and Wooldridge (19951 who proposed methods for testing and correcting for selectivity bias A crucial assumption underlying these methods is the parameterization of the sample selection mechanism Specifically these authors assume that both the unobsew- able effect and the idiosyncratic errors in the selection process are normally distributed The present paper is an important departure from this work in the sense that the distributions of all unobservables are left unspecified

We focus on the case where the data consist of a large number of individuals observed through a small number of time periods and analyze asymptotics as the number of individuals (n) approaches infinity Short-length panels are not only the most relevant for practical purposes they also pose problems in estimation In such cases even if the individual effects are treated as parameters to be estimated a parametric maximum likelihood approach yields inconsistent estimates the well known incidental parameters problem

Our method for estimating the main regression equation of interest follows the familiar two-step approach proposed by Heckman (1974 1976) for paramet- ric selection models which has been used in the construction of most semipara- metric estimators for such models In the first step the unknown coefficients of the selection equation are consistently estimated In the second step these estimates are used to estimate the equation of interest by a weighted least squares regression The fixed effect from the main equation is eliminated by taking time differences on the observed selected variables while the first-step estimates are used to construct weights whose magnitude depends on the magnitude of the sample selection bias For a fixed sample size observations with less selectivity bias are given more weight while asymptotically only those observations with zero bias are used This idea has been used by Powell (19871 and Ahn and Powell (1993) for the estimation of cross sectional selection models The intuition is that for an individual that is selected into the sample in two time periods it is reasonable to assume that the magnitude of the selection effect in the main equation will be the same if the observed variables determin- ing selection remain constant over time Therefore time differencing the outcome equation will eliminate not only its unobservable individual effect but also the sample selection effect In fact by imposing a linear regression structure on the latent model underlying the selection mechanism the above argument will also hold if only the linear combination of the observed selection covariates known up to a finite number of estimable parameters remains constant over time Under appropriate assumptions on the rate of convergence of the first step estimator the proposed estimator of the main equation of interest is shown to be consistent and asymptotically normal with a rate of convergence that can be made arbitrarily close to n- I2 In particular by assuming that the selection equation is estimated at a faster rate than the main equation we obtain a limiting distribution which does not depend on the distribution of the first step estimator

1338 EKATERINI KYRIAZIDOU

The first step of the proposed estimation method requires that the discrete choice selection equation be estimated consistently and at a sufficiently fast rate To this end we propose using a smoothed version of Manskis (1987) condi- tional maximum score e~ t ima to r ~ which follows the approach taken by Horowitz (1992) for estimating cross section discrete choice models Under appropriate assumptions stronger than those in Manski (1987) the smoothed estimator improves on the rate of convergence of the original estimator and also allows standard statistical inference Furthermore it dispenses with parametric as-sumptions on the distribution of the errors required for example by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970)

Although our analysis is based on the assumption of a censored panel with only two observations per individual it easily generalizes to the case of a longer and possiblyunbalanced panel and may be also modified to accommodate truncated samples in which case estimation of the selection equation is infeasi- ble Extensions of our estimation method to cover these situations are discussed at the end of the next section

The paper is organized as follows Section 2 describes the model and moti- vates the proposed estimation procedure Section 3 states the assumptions and derives the asymptotic properties of the estimator Section 4 presents the results of a Monte Carlo study investigating the small sample performance of the proposed estimator Section 5 offers conclusions and suggests topics for future research The proofs of theorems and lemmata are given in the Appendix

2 THE MODEL AND THE PROPOSED ESTIMATOR

We consider the following model

(22) d = lwity+ 17 - uit 2 01

Here p E F t k and y E 8 4 are unknown parameter vectors which we wish to e ~ t i m a t e ~ and wi are vectors of explanatory variables (with possibly common x elements) agtnd 17 are unobservable time-invariant individual-specific effects5 (possibly correlated with the regressors and the errors) ET and uit are unob- served disturbances (not necessarily independent of each other) while yz E 3 is a latent variable whose observability depends on the outcome of the indicator

The smoothed conditional maximum score estimator for binary response panel data models along with its asymptotic properties and necessary assumptions is presented in an earlier version of this paper (Kyriazidou (1994)) See also Charlier Melenberg and van Soest (1995)

Obviously constants cannot be identified in either equation since they would be absorbed in the individual effects

These will be treated as nuisance parameters and will not be estimated Our analysis also applies to the case where a = rl

SAMPLE SELECTION MODEL 1339

variable d E Ol) In particular it is assumed that while ( d ~ ) is always observed (y x) is observed only6 if d = 1 In other words the selection variable d determines whether the itth observation in equation (21) is cen- sored or not Thus our problem is to estimate P and y from a sample consisting of quadruples (dilwiyixi) We will denote the vector of (observed and unobserved) explanatory variables by ii= (wil w x x a q)Notice that without the fixed effects a and rl our model becomes a panel data version of the well known sample selection model considered in the literature and could be estimated by any of the existing methods Without sample selectivity that is with d = 1 for all i and t equation (21) is the standard panel data linear regression model

In our setup it is possible to estimate y in the discrete choice selection equation (22) using either the conditional maximum likelihood approach pro- posed by Rasch (1960 1961) and Andersen (1970) or the conditional maximum score method proposed by Manski (1987) On the other hand estimation of P based on the main equation of interest (21) is confronted with two problems first the presence of the unobservable effect ai=d a and second and more fundamental the potential endogeneity of the regressors xi = dix which arises from their dependence on the selection variable d and which may result in selection bias

The first problem is easily solved by noting that for those observations that have d =d = 1 time differencing will eliminate the effect a from equation (21) This is analogous to the fixed-effects approach taken in linear panel data models In general though application of standard methods eg OLS on this first-differenced subsample will yield inconsistent estimates of P due to sample selectivity This may be seen from the population regression function for the first-differenced subsample

E(y i l -y i2 Id i l=1 d i2=1 l i )

= (x~ - 4 ) p + E ( E ~- ampIdil = 1d i2= 1 i i )

In general there is no reason to expect that E(ampT Id = 1 d = 1 l i ) = 0 or that E ( E ~ Idil = 1di2= 1 i) =E(e2ldil = 1d = I amp) In particular for each time period the sample selection effect A=E(E Idil = 1 d = 1 i i ) depends not only on the (partially unobservable) conditioning vector iibut also on the (generally unknown) joint conditional distribution of (e u u) which may differ across individuals as well as over time for the same individual

A =E(ampldil = 1d i2= 1 i )

=E(sIluil I W Y + 7 u i 2 4 w i 2 y + v i l i )

= A(wily+ ~ i ~ i 2 ~ + q i F (ampT~i l ~ i2I i i ) )

= A i l ( w i l ~+ 77wi2~+ 7h l i)

Obviously the analysis carries through to the case where x is always observed which is the case most commonly treated in the literature

1340 EKATERINI KYRIAZIDOU

It is convenient to rewrite the main equation (21) as a partially linear regression

where ui = s- A is a new error term which by collstruction satisfies E(uld = 1 di2 = 1Ji) = 0 The idea of our scheme for estimating is to difference out the nuisance terms ai and A from the equation above

As a motivation of our estimation procedure consider the case where (s u) is independent and identically distributed over time and across individuals and is independent of J Under these assumptions it is easy to see that

where A() is an unknown function the same over time and across individuals of the single index wily + 7 Obviously in general hi A unless wily = wi2 y In other words for an individual i that has wily = wi2 y and d =d = 1 the sample selection effect A will be the same in the two periods Thus for this particular individual applying first-differences in equation (21) will eliminate both the unobservable effect a and the selection effect hi At this point it is important to notice that even if the functional form of A were known (as for example in the case of a bivariate normal distribution-see Heckman (197611 it would still involve the unobservable effect rl This suggests that it would be generally infeasible to consistently estimate P from (21) even in the absence of the effect a and with knowledge of y unless a parametric form for the distribution of qi conditional on the observed exogenous variables were also specified

The preceding argument for differencing out both nuisance terms from equation (21) will hold under much weaker distributional assumptions In particular since first-differences are taken on an individual basis it is not required that ( s z ui) be iid across individuals nor that it be independent of the individual-specific vector amp In other words we may allow the functional form of 11 to vary across individuals It is also possible to allow for serial correlation in the errors Consider for example the case where (E 82uil ui2) and (E E LL uil) are identically distributed conditional on J ie F(s E

uil ui21 lj)=F(s2 E ui2 uil 1 f) Under this conditional exchangeability assump- tion it is easy to see that for an individual i that has wily = wi2 y

Notice that in general it is not sufficient to assume joint conditional stationarity of the errors An extreme example is the case where 82 E and ui are iid N(0l) and independent of Liwhile ui2 = 8 Then A =E(s2 1s 5 wiZy+ rl) Ai2 =E(sg) regardless of whether wily = wi2 y

SAMPLE SELECTION MODEL 1341

The above discussion which presumes knowledge of the true y suggests estimating p by OLS from a subsample that consists of those observations that have wily = w y and d = d = 1 Defining Ti= lwily = wi2 y Qi = ldil =

d = I = didi2 and with A denoting first differences the OLS estimator is of the form jn = [Cy= Ax Axi I- [Cy= Ax Ay TiQi] Under appropriate reg- ularity conditions this estimator will be consistent and root-n asymptotically normal An obvious requirement is that Pr(Awi y = 0) gt 0 which may be satis- fied for example when all the random variables in wit are discrete or in experimental cases where the distribution of wit is in the control of the researcher situations that are rare in economic applications

Of course this estimation scheme cannot be directly implemented since y is unknown Furthermore as argued above it may be the case that Ti= 0 6e Aw y 0) for all individuals in our sample Notice though that if A is a sufficiently smooth function and i is a consistent estimate of y observations for which the difference Aw is close to zero should also have AA E 0 and the preceding arguments would hold approximately

We therefore propose the following two-step estimation procedure which is in the spirit of Powell (1987) and Ahn and Powell (1993) In the first step y is consistently estimated based on equation (22) alone In the second step the estimate yn is used to estimate p based on those pairs of observations for which wiqn and wiTn are close Specifically we propose

where amp is a weight that declines to zero as the magnitude of the difference I wiqn -wi2YnI increases We choose kernel weights of the form

where K is a kernel density function and h is a sequence of bandwidths which tends to zero as n + m Thus for a fixed (nonzero) magnitude of the difference 1 Aw I the weight Ginshrinks as the sample size increases while for a fixed n a larger I Aw I corresponds to a smaller weight

It is interesting to note that the arguments used in estimating the main regression equation may be modified to accommodate the case of a truncated sample that is when we only observe those individuals that have d = 1 for all time periods Recall that our method for eliminating the sample selection effect from equation (21) is based on the fact that under certain distributional assumptions Aw y = 0 implies Ah = 0 However Aw = 0 also implies Ah = 0 In other words we might dispense altogether with the first step of estimating y and estimate p from those observations for which wil and wi2 are close which would suggest using the weights Gin = (lh)K(Awh) Although this ap- proach would imply a slower rate of convergence for the resulting estimator this

1342 EKATERINI KYRIAZIDOU

estimation scheme may be used for estimating p from a truncated sample in which case estimation of the selection equation is infeasible An obvious drawback in this method is that in order to consistently estimate the entire parameter vector p we would have to impose the restriction that wit and xY do not contain any elements in common

The above analysis extends naturally to the case of a longer (and possibly unbalanced) panel that is when T2 2 Then p could be estimated from those observations that have d = d = 1 and for which wit and wis are close for all s t = 1 qThe estimator is of the form

where

In the following section we derive the asymptotic properties of our proposed estimator for the main equation of interest under the assumption that y has been consistently estimated At the end of the section we examine the applica- bility of existing estimators for obtaining first-step estimates of the selection equation

3 ESTIMATION OF THE MAIN EQUATION

31 Asymptotic Properties of the Estimator

The derivation of the large sample properties of fin of equations (23) and (24) proceeds in two steps First the asymptotic behavior of the infeasible estimator which uses the true y in the construction of the kernel weights denoted by fin is analyzed Then the large sample behavior of the difference ( fin - fin) is investigated

It will be useful to define the scalar index W= Aw y and its estimated counterpart = Aw y along with the following quantities

j= - C -K - Ax Axi n =1 h

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 4: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL 1337

Panel data selection models with latent individual effects have been most recently considered by Verbeek and Nijman (19921 and Wooldridge (19951 who proposed methods for testing and correcting for selectivity bias A crucial assumption underlying these methods is the parameterization of the sample selection mechanism Specifically these authors assume that both the unobsew- able effect and the idiosyncratic errors in the selection process are normally distributed The present paper is an important departure from this work in the sense that the distributions of all unobservables are left unspecified

We focus on the case where the data consist of a large number of individuals observed through a small number of time periods and analyze asymptotics as the number of individuals (n) approaches infinity Short-length panels are not only the most relevant for practical purposes they also pose problems in estimation In such cases even if the individual effects are treated as parameters to be estimated a parametric maximum likelihood approach yields inconsistent estimates the well known incidental parameters problem

Our method for estimating the main regression equation of interest follows the familiar two-step approach proposed by Heckman (1974 1976) for paramet- ric selection models which has been used in the construction of most semipara- metric estimators for such models In the first step the unknown coefficients of the selection equation are consistently estimated In the second step these estimates are used to estimate the equation of interest by a weighted least squares regression The fixed effect from the main equation is eliminated by taking time differences on the observed selected variables while the first-step estimates are used to construct weights whose magnitude depends on the magnitude of the sample selection bias For a fixed sample size observations with less selectivity bias are given more weight while asymptotically only those observations with zero bias are used This idea has been used by Powell (19871 and Ahn and Powell (1993) for the estimation of cross sectional selection models The intuition is that for an individual that is selected into the sample in two time periods it is reasonable to assume that the magnitude of the selection effect in the main equation will be the same if the observed variables determin- ing selection remain constant over time Therefore time differencing the outcome equation will eliminate not only its unobservable individual effect but also the sample selection effect In fact by imposing a linear regression structure on the latent model underlying the selection mechanism the above argument will also hold if only the linear combination of the observed selection covariates known up to a finite number of estimable parameters remains constant over time Under appropriate assumptions on the rate of convergence of the first step estimator the proposed estimator of the main equation of interest is shown to be consistent and asymptotically normal with a rate of convergence that can be made arbitrarily close to n- I2 In particular by assuming that the selection equation is estimated at a faster rate than the main equation we obtain a limiting distribution which does not depend on the distribution of the first step estimator

1338 EKATERINI KYRIAZIDOU

The first step of the proposed estimation method requires that the discrete choice selection equation be estimated consistently and at a sufficiently fast rate To this end we propose using a smoothed version of Manskis (1987) condi- tional maximum score e~ t ima to r ~ which follows the approach taken by Horowitz (1992) for estimating cross section discrete choice models Under appropriate assumptions stronger than those in Manski (1987) the smoothed estimator improves on the rate of convergence of the original estimator and also allows standard statistical inference Furthermore it dispenses with parametric as-sumptions on the distribution of the errors required for example by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970)

Although our analysis is based on the assumption of a censored panel with only two observations per individual it easily generalizes to the case of a longer and possiblyunbalanced panel and may be also modified to accommodate truncated samples in which case estimation of the selection equation is infeasi- ble Extensions of our estimation method to cover these situations are discussed at the end of the next section

The paper is organized as follows Section 2 describes the model and moti- vates the proposed estimation procedure Section 3 states the assumptions and derives the asymptotic properties of the estimator Section 4 presents the results of a Monte Carlo study investigating the small sample performance of the proposed estimator Section 5 offers conclusions and suggests topics for future research The proofs of theorems and lemmata are given in the Appendix

2 THE MODEL AND THE PROPOSED ESTIMATOR

We consider the following model

(22) d = lwity+ 17 - uit 2 01

Here p E F t k and y E 8 4 are unknown parameter vectors which we wish to e ~ t i m a t e ~ and wi are vectors of explanatory variables (with possibly common x elements) agtnd 17 are unobservable time-invariant individual-specific effects5 (possibly correlated with the regressors and the errors) ET and uit are unob- served disturbances (not necessarily independent of each other) while yz E 3 is a latent variable whose observability depends on the outcome of the indicator

The smoothed conditional maximum score estimator for binary response panel data models along with its asymptotic properties and necessary assumptions is presented in an earlier version of this paper (Kyriazidou (1994)) See also Charlier Melenberg and van Soest (1995)

Obviously constants cannot be identified in either equation since they would be absorbed in the individual effects

These will be treated as nuisance parameters and will not be estimated Our analysis also applies to the case where a = rl

SAMPLE SELECTION MODEL 1339

variable d E Ol) In particular it is assumed that while ( d ~ ) is always observed (y x) is observed only6 if d = 1 In other words the selection variable d determines whether the itth observation in equation (21) is cen- sored or not Thus our problem is to estimate P and y from a sample consisting of quadruples (dilwiyixi) We will denote the vector of (observed and unobserved) explanatory variables by ii= (wil w x x a q)Notice that without the fixed effects a and rl our model becomes a panel data version of the well known sample selection model considered in the literature and could be estimated by any of the existing methods Without sample selectivity that is with d = 1 for all i and t equation (21) is the standard panel data linear regression model

In our setup it is possible to estimate y in the discrete choice selection equation (22) using either the conditional maximum likelihood approach pro- posed by Rasch (1960 1961) and Andersen (1970) or the conditional maximum score method proposed by Manski (1987) On the other hand estimation of P based on the main equation of interest (21) is confronted with two problems first the presence of the unobservable effect ai=d a and second and more fundamental the potential endogeneity of the regressors xi = dix which arises from their dependence on the selection variable d and which may result in selection bias

The first problem is easily solved by noting that for those observations that have d =d = 1 time differencing will eliminate the effect a from equation (21) This is analogous to the fixed-effects approach taken in linear panel data models In general though application of standard methods eg OLS on this first-differenced subsample will yield inconsistent estimates of P due to sample selectivity This may be seen from the population regression function for the first-differenced subsample

E(y i l -y i2 Id i l=1 d i2=1 l i )

= (x~ - 4 ) p + E ( E ~- ampIdil = 1d i2= 1 i i )

In general there is no reason to expect that E(ampT Id = 1 d = 1 l i ) = 0 or that E ( E ~ Idil = 1di2= 1 i) =E(e2ldil = 1d = I amp) In particular for each time period the sample selection effect A=E(E Idil = 1 d = 1 i i ) depends not only on the (partially unobservable) conditioning vector iibut also on the (generally unknown) joint conditional distribution of (e u u) which may differ across individuals as well as over time for the same individual

A =E(ampldil = 1d i2= 1 i )

=E(sIluil I W Y + 7 u i 2 4 w i 2 y + v i l i )

= A(wily+ ~ i ~ i 2 ~ + q i F (ampT~i l ~ i2I i i ) )

= A i l ( w i l ~+ 77wi2~+ 7h l i)

Obviously the analysis carries through to the case where x is always observed which is the case most commonly treated in the literature

1340 EKATERINI KYRIAZIDOU

It is convenient to rewrite the main equation (21) as a partially linear regression

where ui = s- A is a new error term which by collstruction satisfies E(uld = 1 di2 = 1Ji) = 0 The idea of our scheme for estimating is to difference out the nuisance terms ai and A from the equation above

As a motivation of our estimation procedure consider the case where (s u) is independent and identically distributed over time and across individuals and is independent of J Under these assumptions it is easy to see that

where A() is an unknown function the same over time and across individuals of the single index wily + 7 Obviously in general hi A unless wily = wi2 y In other words for an individual i that has wily = wi2 y and d =d = 1 the sample selection effect A will be the same in the two periods Thus for this particular individual applying first-differences in equation (21) will eliminate both the unobservable effect a and the selection effect hi At this point it is important to notice that even if the functional form of A were known (as for example in the case of a bivariate normal distribution-see Heckman (197611 it would still involve the unobservable effect rl This suggests that it would be generally infeasible to consistently estimate P from (21) even in the absence of the effect a and with knowledge of y unless a parametric form for the distribution of qi conditional on the observed exogenous variables were also specified

The preceding argument for differencing out both nuisance terms from equation (21) will hold under much weaker distributional assumptions In particular since first-differences are taken on an individual basis it is not required that ( s z ui) be iid across individuals nor that it be independent of the individual-specific vector amp In other words we may allow the functional form of 11 to vary across individuals It is also possible to allow for serial correlation in the errors Consider for example the case where (E 82uil ui2) and (E E LL uil) are identically distributed conditional on J ie F(s E

uil ui21 lj)=F(s2 E ui2 uil 1 f) Under this conditional exchangeability assump- tion it is easy to see that for an individual i that has wily = wi2 y

Notice that in general it is not sufficient to assume joint conditional stationarity of the errors An extreme example is the case where 82 E and ui are iid N(0l) and independent of Liwhile ui2 = 8 Then A =E(s2 1s 5 wiZy+ rl) Ai2 =E(sg) regardless of whether wily = wi2 y

SAMPLE SELECTION MODEL 1341

The above discussion which presumes knowledge of the true y suggests estimating p by OLS from a subsample that consists of those observations that have wily = w y and d = d = 1 Defining Ti= lwily = wi2 y Qi = ldil =

d = I = didi2 and with A denoting first differences the OLS estimator is of the form jn = [Cy= Ax Axi I- [Cy= Ax Ay TiQi] Under appropriate reg- ularity conditions this estimator will be consistent and root-n asymptotically normal An obvious requirement is that Pr(Awi y = 0) gt 0 which may be satis- fied for example when all the random variables in wit are discrete or in experimental cases where the distribution of wit is in the control of the researcher situations that are rare in economic applications

Of course this estimation scheme cannot be directly implemented since y is unknown Furthermore as argued above it may be the case that Ti= 0 6e Aw y 0) for all individuals in our sample Notice though that if A is a sufficiently smooth function and i is a consistent estimate of y observations for which the difference Aw is close to zero should also have AA E 0 and the preceding arguments would hold approximately

We therefore propose the following two-step estimation procedure which is in the spirit of Powell (1987) and Ahn and Powell (1993) In the first step y is consistently estimated based on equation (22) alone In the second step the estimate yn is used to estimate p based on those pairs of observations for which wiqn and wiTn are close Specifically we propose

where amp is a weight that declines to zero as the magnitude of the difference I wiqn -wi2YnI increases We choose kernel weights of the form

where K is a kernel density function and h is a sequence of bandwidths which tends to zero as n + m Thus for a fixed (nonzero) magnitude of the difference 1 Aw I the weight Ginshrinks as the sample size increases while for a fixed n a larger I Aw I corresponds to a smaller weight

It is interesting to note that the arguments used in estimating the main regression equation may be modified to accommodate the case of a truncated sample that is when we only observe those individuals that have d = 1 for all time periods Recall that our method for eliminating the sample selection effect from equation (21) is based on the fact that under certain distributional assumptions Aw y = 0 implies Ah = 0 However Aw = 0 also implies Ah = 0 In other words we might dispense altogether with the first step of estimating y and estimate p from those observations for which wil and wi2 are close which would suggest using the weights Gin = (lh)K(Awh) Although this ap- proach would imply a slower rate of convergence for the resulting estimator this

1342 EKATERINI KYRIAZIDOU

estimation scheme may be used for estimating p from a truncated sample in which case estimation of the selection equation is infeasible An obvious drawback in this method is that in order to consistently estimate the entire parameter vector p we would have to impose the restriction that wit and xY do not contain any elements in common

The above analysis extends naturally to the case of a longer (and possibly unbalanced) panel that is when T2 2 Then p could be estimated from those observations that have d = d = 1 and for which wit and wis are close for all s t = 1 qThe estimator is of the form

where

In the following section we derive the asymptotic properties of our proposed estimator for the main equation of interest under the assumption that y has been consistently estimated At the end of the section we examine the applica- bility of existing estimators for obtaining first-step estimates of the selection equation

3 ESTIMATION OF THE MAIN EQUATION

31 Asymptotic Properties of the Estimator

The derivation of the large sample properties of fin of equations (23) and (24) proceeds in two steps First the asymptotic behavior of the infeasible estimator which uses the true y in the construction of the kernel weights denoted by fin is analyzed Then the large sample behavior of the difference ( fin - fin) is investigated

It will be useful to define the scalar index W= Aw y and its estimated counterpart = Aw y along with the following quantities

j= - C -K - Ax Axi n =1 h

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 5: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1338 EKATERINI KYRIAZIDOU

The first step of the proposed estimation method requires that the discrete choice selection equation be estimated consistently and at a sufficiently fast rate To this end we propose using a smoothed version of Manskis (1987) condi- tional maximum score e~ t ima to r ~ which follows the approach taken by Horowitz (1992) for estimating cross section discrete choice models Under appropriate assumptions stronger than those in Manski (1987) the smoothed estimator improves on the rate of convergence of the original estimator and also allows standard statistical inference Furthermore it dispenses with parametric as-sumptions on the distribution of the errors required for example by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970)

Although our analysis is based on the assumption of a censored panel with only two observations per individual it easily generalizes to the case of a longer and possiblyunbalanced panel and may be also modified to accommodate truncated samples in which case estimation of the selection equation is infeasi- ble Extensions of our estimation method to cover these situations are discussed at the end of the next section

The paper is organized as follows Section 2 describes the model and moti- vates the proposed estimation procedure Section 3 states the assumptions and derives the asymptotic properties of the estimator Section 4 presents the results of a Monte Carlo study investigating the small sample performance of the proposed estimator Section 5 offers conclusions and suggests topics for future research The proofs of theorems and lemmata are given in the Appendix

2 THE MODEL AND THE PROPOSED ESTIMATOR

We consider the following model

(22) d = lwity+ 17 - uit 2 01

Here p E F t k and y E 8 4 are unknown parameter vectors which we wish to e ~ t i m a t e ~ and wi are vectors of explanatory variables (with possibly common x elements) agtnd 17 are unobservable time-invariant individual-specific effects5 (possibly correlated with the regressors and the errors) ET and uit are unob- served disturbances (not necessarily independent of each other) while yz E 3 is a latent variable whose observability depends on the outcome of the indicator

The smoothed conditional maximum score estimator for binary response panel data models along with its asymptotic properties and necessary assumptions is presented in an earlier version of this paper (Kyriazidou (1994)) See also Charlier Melenberg and van Soest (1995)

Obviously constants cannot be identified in either equation since they would be absorbed in the individual effects

These will be treated as nuisance parameters and will not be estimated Our analysis also applies to the case where a = rl

SAMPLE SELECTION MODEL 1339

variable d E Ol) In particular it is assumed that while ( d ~ ) is always observed (y x) is observed only6 if d = 1 In other words the selection variable d determines whether the itth observation in equation (21) is cen- sored or not Thus our problem is to estimate P and y from a sample consisting of quadruples (dilwiyixi) We will denote the vector of (observed and unobserved) explanatory variables by ii= (wil w x x a q)Notice that without the fixed effects a and rl our model becomes a panel data version of the well known sample selection model considered in the literature and could be estimated by any of the existing methods Without sample selectivity that is with d = 1 for all i and t equation (21) is the standard panel data linear regression model

In our setup it is possible to estimate y in the discrete choice selection equation (22) using either the conditional maximum likelihood approach pro- posed by Rasch (1960 1961) and Andersen (1970) or the conditional maximum score method proposed by Manski (1987) On the other hand estimation of P based on the main equation of interest (21) is confronted with two problems first the presence of the unobservable effect ai=d a and second and more fundamental the potential endogeneity of the regressors xi = dix which arises from their dependence on the selection variable d and which may result in selection bias

The first problem is easily solved by noting that for those observations that have d =d = 1 time differencing will eliminate the effect a from equation (21) This is analogous to the fixed-effects approach taken in linear panel data models In general though application of standard methods eg OLS on this first-differenced subsample will yield inconsistent estimates of P due to sample selectivity This may be seen from the population regression function for the first-differenced subsample

E(y i l -y i2 Id i l=1 d i2=1 l i )

= (x~ - 4 ) p + E ( E ~- ampIdil = 1d i2= 1 i i )

In general there is no reason to expect that E(ampT Id = 1 d = 1 l i ) = 0 or that E ( E ~ Idil = 1di2= 1 i) =E(e2ldil = 1d = I amp) In particular for each time period the sample selection effect A=E(E Idil = 1 d = 1 i i ) depends not only on the (partially unobservable) conditioning vector iibut also on the (generally unknown) joint conditional distribution of (e u u) which may differ across individuals as well as over time for the same individual

A =E(ampldil = 1d i2= 1 i )

=E(sIluil I W Y + 7 u i 2 4 w i 2 y + v i l i )

= A(wily+ ~ i ~ i 2 ~ + q i F (ampT~i l ~ i2I i i ) )

= A i l ( w i l ~+ 77wi2~+ 7h l i)

Obviously the analysis carries through to the case where x is always observed which is the case most commonly treated in the literature

1340 EKATERINI KYRIAZIDOU

It is convenient to rewrite the main equation (21) as a partially linear regression

where ui = s- A is a new error term which by collstruction satisfies E(uld = 1 di2 = 1Ji) = 0 The idea of our scheme for estimating is to difference out the nuisance terms ai and A from the equation above

As a motivation of our estimation procedure consider the case where (s u) is independent and identically distributed over time and across individuals and is independent of J Under these assumptions it is easy to see that

where A() is an unknown function the same over time and across individuals of the single index wily + 7 Obviously in general hi A unless wily = wi2 y In other words for an individual i that has wily = wi2 y and d =d = 1 the sample selection effect A will be the same in the two periods Thus for this particular individual applying first-differences in equation (21) will eliminate both the unobservable effect a and the selection effect hi At this point it is important to notice that even if the functional form of A were known (as for example in the case of a bivariate normal distribution-see Heckman (197611 it would still involve the unobservable effect rl This suggests that it would be generally infeasible to consistently estimate P from (21) even in the absence of the effect a and with knowledge of y unless a parametric form for the distribution of qi conditional on the observed exogenous variables were also specified

The preceding argument for differencing out both nuisance terms from equation (21) will hold under much weaker distributional assumptions In particular since first-differences are taken on an individual basis it is not required that ( s z ui) be iid across individuals nor that it be independent of the individual-specific vector amp In other words we may allow the functional form of 11 to vary across individuals It is also possible to allow for serial correlation in the errors Consider for example the case where (E 82uil ui2) and (E E LL uil) are identically distributed conditional on J ie F(s E

uil ui21 lj)=F(s2 E ui2 uil 1 f) Under this conditional exchangeability assump- tion it is easy to see that for an individual i that has wily = wi2 y

Notice that in general it is not sufficient to assume joint conditional stationarity of the errors An extreme example is the case where 82 E and ui are iid N(0l) and independent of Liwhile ui2 = 8 Then A =E(s2 1s 5 wiZy+ rl) Ai2 =E(sg) regardless of whether wily = wi2 y

SAMPLE SELECTION MODEL 1341

The above discussion which presumes knowledge of the true y suggests estimating p by OLS from a subsample that consists of those observations that have wily = w y and d = d = 1 Defining Ti= lwily = wi2 y Qi = ldil =

d = I = didi2 and with A denoting first differences the OLS estimator is of the form jn = [Cy= Ax Axi I- [Cy= Ax Ay TiQi] Under appropriate reg- ularity conditions this estimator will be consistent and root-n asymptotically normal An obvious requirement is that Pr(Awi y = 0) gt 0 which may be satis- fied for example when all the random variables in wit are discrete or in experimental cases where the distribution of wit is in the control of the researcher situations that are rare in economic applications

Of course this estimation scheme cannot be directly implemented since y is unknown Furthermore as argued above it may be the case that Ti= 0 6e Aw y 0) for all individuals in our sample Notice though that if A is a sufficiently smooth function and i is a consistent estimate of y observations for which the difference Aw is close to zero should also have AA E 0 and the preceding arguments would hold approximately

We therefore propose the following two-step estimation procedure which is in the spirit of Powell (1987) and Ahn and Powell (1993) In the first step y is consistently estimated based on equation (22) alone In the second step the estimate yn is used to estimate p based on those pairs of observations for which wiqn and wiTn are close Specifically we propose

where amp is a weight that declines to zero as the magnitude of the difference I wiqn -wi2YnI increases We choose kernel weights of the form

where K is a kernel density function and h is a sequence of bandwidths which tends to zero as n + m Thus for a fixed (nonzero) magnitude of the difference 1 Aw I the weight Ginshrinks as the sample size increases while for a fixed n a larger I Aw I corresponds to a smaller weight

It is interesting to note that the arguments used in estimating the main regression equation may be modified to accommodate the case of a truncated sample that is when we only observe those individuals that have d = 1 for all time periods Recall that our method for eliminating the sample selection effect from equation (21) is based on the fact that under certain distributional assumptions Aw y = 0 implies Ah = 0 However Aw = 0 also implies Ah = 0 In other words we might dispense altogether with the first step of estimating y and estimate p from those observations for which wil and wi2 are close which would suggest using the weights Gin = (lh)K(Awh) Although this ap- proach would imply a slower rate of convergence for the resulting estimator this

1342 EKATERINI KYRIAZIDOU

estimation scheme may be used for estimating p from a truncated sample in which case estimation of the selection equation is infeasible An obvious drawback in this method is that in order to consistently estimate the entire parameter vector p we would have to impose the restriction that wit and xY do not contain any elements in common

The above analysis extends naturally to the case of a longer (and possibly unbalanced) panel that is when T2 2 Then p could be estimated from those observations that have d = d = 1 and for which wit and wis are close for all s t = 1 qThe estimator is of the form

where

In the following section we derive the asymptotic properties of our proposed estimator for the main equation of interest under the assumption that y has been consistently estimated At the end of the section we examine the applica- bility of existing estimators for obtaining first-step estimates of the selection equation

3 ESTIMATION OF THE MAIN EQUATION

31 Asymptotic Properties of the Estimator

The derivation of the large sample properties of fin of equations (23) and (24) proceeds in two steps First the asymptotic behavior of the infeasible estimator which uses the true y in the construction of the kernel weights denoted by fin is analyzed Then the large sample behavior of the difference ( fin - fin) is investigated

It will be useful to define the scalar index W= Aw y and its estimated counterpart = Aw y along with the following quantities

j= - C -K - Ax Axi n =1 h

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 6: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL 1339

variable d E Ol) In particular it is assumed that while ( d ~ ) is always observed (y x) is observed only6 if d = 1 In other words the selection variable d determines whether the itth observation in equation (21) is cen- sored or not Thus our problem is to estimate P and y from a sample consisting of quadruples (dilwiyixi) We will denote the vector of (observed and unobserved) explanatory variables by ii= (wil w x x a q)Notice that without the fixed effects a and rl our model becomes a panel data version of the well known sample selection model considered in the literature and could be estimated by any of the existing methods Without sample selectivity that is with d = 1 for all i and t equation (21) is the standard panel data linear regression model

In our setup it is possible to estimate y in the discrete choice selection equation (22) using either the conditional maximum likelihood approach pro- posed by Rasch (1960 1961) and Andersen (1970) or the conditional maximum score method proposed by Manski (1987) On the other hand estimation of P based on the main equation of interest (21) is confronted with two problems first the presence of the unobservable effect ai=d a and second and more fundamental the potential endogeneity of the regressors xi = dix which arises from their dependence on the selection variable d and which may result in selection bias

The first problem is easily solved by noting that for those observations that have d =d = 1 time differencing will eliminate the effect a from equation (21) This is analogous to the fixed-effects approach taken in linear panel data models In general though application of standard methods eg OLS on this first-differenced subsample will yield inconsistent estimates of P due to sample selectivity This may be seen from the population regression function for the first-differenced subsample

E(y i l -y i2 Id i l=1 d i2=1 l i )

= (x~ - 4 ) p + E ( E ~- ampIdil = 1d i2= 1 i i )

In general there is no reason to expect that E(ampT Id = 1 d = 1 l i ) = 0 or that E ( E ~ Idil = 1di2= 1 i) =E(e2ldil = 1d = I amp) In particular for each time period the sample selection effect A=E(E Idil = 1 d = 1 i i ) depends not only on the (partially unobservable) conditioning vector iibut also on the (generally unknown) joint conditional distribution of (e u u) which may differ across individuals as well as over time for the same individual

A =E(ampldil = 1d i2= 1 i )

=E(sIluil I W Y + 7 u i 2 4 w i 2 y + v i l i )

= A(wily+ ~ i ~ i 2 ~ + q i F (ampT~i l ~ i2I i i ) )

= A i l ( w i l ~+ 77wi2~+ 7h l i)

Obviously the analysis carries through to the case where x is always observed which is the case most commonly treated in the literature

1340 EKATERINI KYRIAZIDOU

It is convenient to rewrite the main equation (21) as a partially linear regression

where ui = s- A is a new error term which by collstruction satisfies E(uld = 1 di2 = 1Ji) = 0 The idea of our scheme for estimating is to difference out the nuisance terms ai and A from the equation above

As a motivation of our estimation procedure consider the case where (s u) is independent and identically distributed over time and across individuals and is independent of J Under these assumptions it is easy to see that

where A() is an unknown function the same over time and across individuals of the single index wily + 7 Obviously in general hi A unless wily = wi2 y In other words for an individual i that has wily = wi2 y and d =d = 1 the sample selection effect A will be the same in the two periods Thus for this particular individual applying first-differences in equation (21) will eliminate both the unobservable effect a and the selection effect hi At this point it is important to notice that even if the functional form of A were known (as for example in the case of a bivariate normal distribution-see Heckman (197611 it would still involve the unobservable effect rl This suggests that it would be generally infeasible to consistently estimate P from (21) even in the absence of the effect a and with knowledge of y unless a parametric form for the distribution of qi conditional on the observed exogenous variables were also specified

The preceding argument for differencing out both nuisance terms from equation (21) will hold under much weaker distributional assumptions In particular since first-differences are taken on an individual basis it is not required that ( s z ui) be iid across individuals nor that it be independent of the individual-specific vector amp In other words we may allow the functional form of 11 to vary across individuals It is also possible to allow for serial correlation in the errors Consider for example the case where (E 82uil ui2) and (E E LL uil) are identically distributed conditional on J ie F(s E

uil ui21 lj)=F(s2 E ui2 uil 1 f) Under this conditional exchangeability assump- tion it is easy to see that for an individual i that has wily = wi2 y

Notice that in general it is not sufficient to assume joint conditional stationarity of the errors An extreme example is the case where 82 E and ui are iid N(0l) and independent of Liwhile ui2 = 8 Then A =E(s2 1s 5 wiZy+ rl) Ai2 =E(sg) regardless of whether wily = wi2 y

SAMPLE SELECTION MODEL 1341

The above discussion which presumes knowledge of the true y suggests estimating p by OLS from a subsample that consists of those observations that have wily = w y and d = d = 1 Defining Ti= lwily = wi2 y Qi = ldil =

d = I = didi2 and with A denoting first differences the OLS estimator is of the form jn = [Cy= Ax Axi I- [Cy= Ax Ay TiQi] Under appropriate reg- ularity conditions this estimator will be consistent and root-n asymptotically normal An obvious requirement is that Pr(Awi y = 0) gt 0 which may be satis- fied for example when all the random variables in wit are discrete or in experimental cases where the distribution of wit is in the control of the researcher situations that are rare in economic applications

Of course this estimation scheme cannot be directly implemented since y is unknown Furthermore as argued above it may be the case that Ti= 0 6e Aw y 0) for all individuals in our sample Notice though that if A is a sufficiently smooth function and i is a consistent estimate of y observations for which the difference Aw is close to zero should also have AA E 0 and the preceding arguments would hold approximately

We therefore propose the following two-step estimation procedure which is in the spirit of Powell (1987) and Ahn and Powell (1993) In the first step y is consistently estimated based on equation (22) alone In the second step the estimate yn is used to estimate p based on those pairs of observations for which wiqn and wiTn are close Specifically we propose

where amp is a weight that declines to zero as the magnitude of the difference I wiqn -wi2YnI increases We choose kernel weights of the form

where K is a kernel density function and h is a sequence of bandwidths which tends to zero as n + m Thus for a fixed (nonzero) magnitude of the difference 1 Aw I the weight Ginshrinks as the sample size increases while for a fixed n a larger I Aw I corresponds to a smaller weight

It is interesting to note that the arguments used in estimating the main regression equation may be modified to accommodate the case of a truncated sample that is when we only observe those individuals that have d = 1 for all time periods Recall that our method for eliminating the sample selection effect from equation (21) is based on the fact that under certain distributional assumptions Aw y = 0 implies Ah = 0 However Aw = 0 also implies Ah = 0 In other words we might dispense altogether with the first step of estimating y and estimate p from those observations for which wil and wi2 are close which would suggest using the weights Gin = (lh)K(Awh) Although this ap- proach would imply a slower rate of convergence for the resulting estimator this

1342 EKATERINI KYRIAZIDOU

estimation scheme may be used for estimating p from a truncated sample in which case estimation of the selection equation is infeasible An obvious drawback in this method is that in order to consistently estimate the entire parameter vector p we would have to impose the restriction that wit and xY do not contain any elements in common

The above analysis extends naturally to the case of a longer (and possibly unbalanced) panel that is when T2 2 Then p could be estimated from those observations that have d = d = 1 and for which wit and wis are close for all s t = 1 qThe estimator is of the form

where

In the following section we derive the asymptotic properties of our proposed estimator for the main equation of interest under the assumption that y has been consistently estimated At the end of the section we examine the applica- bility of existing estimators for obtaining first-step estimates of the selection equation

3 ESTIMATION OF THE MAIN EQUATION

31 Asymptotic Properties of the Estimator

The derivation of the large sample properties of fin of equations (23) and (24) proceeds in two steps First the asymptotic behavior of the infeasible estimator which uses the true y in the construction of the kernel weights denoted by fin is analyzed Then the large sample behavior of the difference ( fin - fin) is investigated

It will be useful to define the scalar index W= Aw y and its estimated counterpart = Aw y along with the following quantities

j= - C -K - Ax Axi n =1 h

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 7: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1340 EKATERINI KYRIAZIDOU

It is convenient to rewrite the main equation (21) as a partially linear regression

where ui = s- A is a new error term which by collstruction satisfies E(uld = 1 di2 = 1Ji) = 0 The idea of our scheme for estimating is to difference out the nuisance terms ai and A from the equation above

As a motivation of our estimation procedure consider the case where (s u) is independent and identically distributed over time and across individuals and is independent of J Under these assumptions it is easy to see that

where A() is an unknown function the same over time and across individuals of the single index wily + 7 Obviously in general hi A unless wily = wi2 y In other words for an individual i that has wily = wi2 y and d =d = 1 the sample selection effect A will be the same in the two periods Thus for this particular individual applying first-differences in equation (21) will eliminate both the unobservable effect a and the selection effect hi At this point it is important to notice that even if the functional form of A were known (as for example in the case of a bivariate normal distribution-see Heckman (197611 it would still involve the unobservable effect rl This suggests that it would be generally infeasible to consistently estimate P from (21) even in the absence of the effect a and with knowledge of y unless a parametric form for the distribution of qi conditional on the observed exogenous variables were also specified

The preceding argument for differencing out both nuisance terms from equation (21) will hold under much weaker distributional assumptions In particular since first-differences are taken on an individual basis it is not required that ( s z ui) be iid across individuals nor that it be independent of the individual-specific vector amp In other words we may allow the functional form of 11 to vary across individuals It is also possible to allow for serial correlation in the errors Consider for example the case where (E 82uil ui2) and (E E LL uil) are identically distributed conditional on J ie F(s E

uil ui21 lj)=F(s2 E ui2 uil 1 f) Under this conditional exchangeability assump- tion it is easy to see that for an individual i that has wily = wi2 y

Notice that in general it is not sufficient to assume joint conditional stationarity of the errors An extreme example is the case where 82 E and ui are iid N(0l) and independent of Liwhile ui2 = 8 Then A =E(s2 1s 5 wiZy+ rl) Ai2 =E(sg) regardless of whether wily = wi2 y

SAMPLE SELECTION MODEL 1341

The above discussion which presumes knowledge of the true y suggests estimating p by OLS from a subsample that consists of those observations that have wily = w y and d = d = 1 Defining Ti= lwily = wi2 y Qi = ldil =

d = I = didi2 and with A denoting first differences the OLS estimator is of the form jn = [Cy= Ax Axi I- [Cy= Ax Ay TiQi] Under appropriate reg- ularity conditions this estimator will be consistent and root-n asymptotically normal An obvious requirement is that Pr(Awi y = 0) gt 0 which may be satis- fied for example when all the random variables in wit are discrete or in experimental cases where the distribution of wit is in the control of the researcher situations that are rare in economic applications

Of course this estimation scheme cannot be directly implemented since y is unknown Furthermore as argued above it may be the case that Ti= 0 6e Aw y 0) for all individuals in our sample Notice though that if A is a sufficiently smooth function and i is a consistent estimate of y observations for which the difference Aw is close to zero should also have AA E 0 and the preceding arguments would hold approximately

We therefore propose the following two-step estimation procedure which is in the spirit of Powell (1987) and Ahn and Powell (1993) In the first step y is consistently estimated based on equation (22) alone In the second step the estimate yn is used to estimate p based on those pairs of observations for which wiqn and wiTn are close Specifically we propose

where amp is a weight that declines to zero as the magnitude of the difference I wiqn -wi2YnI increases We choose kernel weights of the form

where K is a kernel density function and h is a sequence of bandwidths which tends to zero as n + m Thus for a fixed (nonzero) magnitude of the difference 1 Aw I the weight Ginshrinks as the sample size increases while for a fixed n a larger I Aw I corresponds to a smaller weight

It is interesting to note that the arguments used in estimating the main regression equation may be modified to accommodate the case of a truncated sample that is when we only observe those individuals that have d = 1 for all time periods Recall that our method for eliminating the sample selection effect from equation (21) is based on the fact that under certain distributional assumptions Aw y = 0 implies Ah = 0 However Aw = 0 also implies Ah = 0 In other words we might dispense altogether with the first step of estimating y and estimate p from those observations for which wil and wi2 are close which would suggest using the weights Gin = (lh)K(Awh) Although this ap- proach would imply a slower rate of convergence for the resulting estimator this

1342 EKATERINI KYRIAZIDOU

estimation scheme may be used for estimating p from a truncated sample in which case estimation of the selection equation is infeasible An obvious drawback in this method is that in order to consistently estimate the entire parameter vector p we would have to impose the restriction that wit and xY do not contain any elements in common

The above analysis extends naturally to the case of a longer (and possibly unbalanced) panel that is when T2 2 Then p could be estimated from those observations that have d = d = 1 and for which wit and wis are close for all s t = 1 qThe estimator is of the form

where

In the following section we derive the asymptotic properties of our proposed estimator for the main equation of interest under the assumption that y has been consistently estimated At the end of the section we examine the applica- bility of existing estimators for obtaining first-step estimates of the selection equation

3 ESTIMATION OF THE MAIN EQUATION

31 Asymptotic Properties of the Estimator

The derivation of the large sample properties of fin of equations (23) and (24) proceeds in two steps First the asymptotic behavior of the infeasible estimator which uses the true y in the construction of the kernel weights denoted by fin is analyzed Then the large sample behavior of the difference ( fin - fin) is investigated

It will be useful to define the scalar index W= Aw y and its estimated counterpart = Aw y along with the following quantities

j= - C -K - Ax Axi n =1 h

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 8: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL 1341

The above discussion which presumes knowledge of the true y suggests estimating p by OLS from a subsample that consists of those observations that have wily = w y and d = d = 1 Defining Ti= lwily = wi2 y Qi = ldil =

d = I = didi2 and with A denoting first differences the OLS estimator is of the form jn = [Cy= Ax Axi I- [Cy= Ax Ay TiQi] Under appropriate reg- ularity conditions this estimator will be consistent and root-n asymptotically normal An obvious requirement is that Pr(Awi y = 0) gt 0 which may be satis- fied for example when all the random variables in wit are discrete or in experimental cases where the distribution of wit is in the control of the researcher situations that are rare in economic applications

Of course this estimation scheme cannot be directly implemented since y is unknown Furthermore as argued above it may be the case that Ti= 0 6e Aw y 0) for all individuals in our sample Notice though that if A is a sufficiently smooth function and i is a consistent estimate of y observations for which the difference Aw is close to zero should also have AA E 0 and the preceding arguments would hold approximately

We therefore propose the following two-step estimation procedure which is in the spirit of Powell (1987) and Ahn and Powell (1993) In the first step y is consistently estimated based on equation (22) alone In the second step the estimate yn is used to estimate p based on those pairs of observations for which wiqn and wiTn are close Specifically we propose

where amp is a weight that declines to zero as the magnitude of the difference I wiqn -wi2YnI increases We choose kernel weights of the form

where K is a kernel density function and h is a sequence of bandwidths which tends to zero as n + m Thus for a fixed (nonzero) magnitude of the difference 1 Aw I the weight Ginshrinks as the sample size increases while for a fixed n a larger I Aw I corresponds to a smaller weight

It is interesting to note that the arguments used in estimating the main regression equation may be modified to accommodate the case of a truncated sample that is when we only observe those individuals that have d = 1 for all time periods Recall that our method for eliminating the sample selection effect from equation (21) is based on the fact that under certain distributional assumptions Aw y = 0 implies Ah = 0 However Aw = 0 also implies Ah = 0 In other words we might dispense altogether with the first step of estimating y and estimate p from those observations for which wil and wi2 are close which would suggest using the weights Gin = (lh)K(Awh) Although this ap- proach would imply a slower rate of convergence for the resulting estimator this

1342 EKATERINI KYRIAZIDOU

estimation scheme may be used for estimating p from a truncated sample in which case estimation of the selection equation is infeasible An obvious drawback in this method is that in order to consistently estimate the entire parameter vector p we would have to impose the restriction that wit and xY do not contain any elements in common

The above analysis extends naturally to the case of a longer (and possibly unbalanced) panel that is when T2 2 Then p could be estimated from those observations that have d = d = 1 and for which wit and wis are close for all s t = 1 qThe estimator is of the form

where

In the following section we derive the asymptotic properties of our proposed estimator for the main equation of interest under the assumption that y has been consistently estimated At the end of the section we examine the applica- bility of existing estimators for obtaining first-step estimates of the selection equation

3 ESTIMATION OF THE MAIN EQUATION

31 Asymptotic Properties of the Estimator

The derivation of the large sample properties of fin of equations (23) and (24) proceeds in two steps First the asymptotic behavior of the infeasible estimator which uses the true y in the construction of the kernel weights denoted by fin is analyzed Then the large sample behavior of the difference ( fin - fin) is investigated

It will be useful to define the scalar index W= Aw y and its estimated counterpart = Aw y along with the following quantities

j= - C -K - Ax Axi n =1 h

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 9: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1342 EKATERINI KYRIAZIDOU

estimation scheme may be used for estimating p from a truncated sample in which case estimation of the selection equation is infeasible An obvious drawback in this method is that in order to consistently estimate the entire parameter vector p we would have to impose the restriction that wit and xY do not contain any elements in common

The above analysis extends naturally to the case of a longer (and possibly unbalanced) panel that is when T2 2 Then p could be estimated from those observations that have d = d = 1 and for which wit and wis are close for all s t = 1 qThe estimator is of the form

where

In the following section we derive the asymptotic properties of our proposed estimator for the main equation of interest under the assumption that y has been consistently estimated At the end of the section we examine the applica- bility of existing estimators for obtaining first-step estimates of the selection equation

3 ESTIMATION OF THE MAIN EQUATION

31 Asymptotic Properties of the Estimator

The derivation of the large sample properties of fin of equations (23) and (24) proceeds in two steps First the asymptotic behavior of the infeasible estimator which uses the true y in the construction of the kernel weights denoted by fin is analyzed Then the large sample behavior of the difference ( fin - fin) is investigated

It will be useful to define the scalar index W= Aw y and its estimated counterpart = Aw y along with the following quantities

j= - C -K - Ax Axi n =1 h

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 10: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL

With these definitions we can write amp - 3 = S$(S + S) and bn- 3 =

i(ixL + $I Our asymptotic results for the infeasible estimator are based on the following

assumptions From Section 2 = dildi2 ii= ( w ~ wi2 x~ aq) and uit = ditE - Idil = 1 di2 = 1 6) E ( E ~

ASSUMPTIONR1 (E E uI1 ui2) and (ampA ET ui2 uil) are identically dis- tributed conditional on 6 That is F(E E uil ui21 6) =F(E E ui2 uill 6)

As discussed in Section 2 this conditional exchangeability assumption is crucial to our method for eliminating the sample selection effect Although in principle we could allow F to vary across individuals it will be convenient for our analysis to assume that cross-section sampling is random

ASSUMPTION a wit u I ~ ) is drawn R2 An iid sample (xT E t = 12 from the population For each i = 1 n and each t = 12 we obserue (djt Wit ~ j t xit)

With this assumption we may from now on drop the subscripts i that denote the identity of each panel member

ASSUMPTIONR3 E( Ax Ax I W = 0) is finite and nonsingular

Note that this assumption implicitly imposes an exclusion restriction on the set of regressors namely that at least one of the variables in the selection equation wit is not contained in x

ASSUMPTIONR4 The marginal distribution of the index function W EAw y is absolutely continuous with density function f which is bounded from aboue on its support and strictly positive at zero ie f(O) gt 0 In addition f is almost everywhere r times ( r 2 1) continuously differentiable and has bounded deri~atiues~

Observe that by definition Ax= QiAx Thus although certain assumptions are stated in terms of the observed regressors x they also hold for the latent (possibly unobserved) x$

It is possible to relax certain smoothness assumptions so that they hold only in a neighborhood of W near zero at the cost though of more technical detail

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 11: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1344 EKATERINI KYRIAZIDOU

ASSUMPTIONR5 The unknown function9 il(wly + 7w y + 7 J ) = E(E Idl =

l d = l ~ ) ~ E ( ~ ~ I u ~ lt w ~ y + ~ u lt w y + _ r ] J )A(s s J ) -satisfies A(s_sJ)=il(s-s) for t r = 1 2 where A is afunction of (ss J ) ieA = Ais s 5 1 which is bounded on its support

This assumption is crucial to our analysis It will be satisfied for example if A is continuously differentiable with respect to its first two arguments with bounded first-order partial derivatives (as for example when the errors are jointly normally distributed) in which case we may apply the multivariate mean-value theorem

Here A(]) (j = 12) denotes the first-order partial derivative of A with respect to its first and second argument respectively and c lies on the line segment connecting (w y + r ] w y + 7 ) and (w + 7 wl y + 7 J ) Thus in this case A = 11(2)(~1Acl)(cT)- ) and by assumption will be bounded

ASSUMPTIONR6 (a) x and r have bounded 4 + 2 6 moments conditional on W for any 6 E (0l)

(b) E(Axl Ax I W) and E(Axt Ax Au2 I W) are continuous at W = 0 and do not uanish

(c) E ( Ax j l W) is almost eueiywhere r times continuously differenfiable as a fiinction of W and has bounded deri~latices

ASSUMPTIONR7 The function K 3+ 91 satisfies (a) jK(v) dv = 1 (b) lIK(v)l d v lt a (c) supvlK(vgtl lt m id) l l v l r f l l ~ ( v ) l d v lt and (el lvJK(v) d v = O fo ra l l j= 1r

ASSUMPTIONR8 h +0 and nh +m as n -t cc

From our analysis in Section 2 it is easy to see that Assumptions R1-R3 would suffice to identify P for known y An identification scheme in the spirit of our discussion in Section 2 would obviously require support of W at zero as well as nonsingularity of the matrix 2yyimposed by Assumption R3 analogous to the familiar full rank assumption

The continuity of the distribution of the index W imposed in Assumption R4 is a regularity condition common in kernel estimation of density ad regression functions It is precisely this continuity that renders the estimator P of Section 2 infeasible even if y were known

~ o t i c e that by Assumption R1 thc functional form of A is the same over time for the same individual while by Assumption R2 it is also the same across ndividuals

10 In principle we could dispense with the assumption that 11 is bounded by assuming that has finite fourth moment conditional on 1V

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 12: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL 1345

Since our estimation scheme is based on pairs of observations for which = Aw y E 0 it is obvious that additional smoothness conditions are required

These are imposed by Assumptions R4-R8 Notice in particular Assumption R5 which imposes a Lipschitz continuity property on the selection correction function A( ) It is easy to see that simple continuity will not be sufficient to guarantee that Ah + 0 as U+ 0 since Ahi is not a function of UFurther-more similarly to kernel density and regression estimation a high order of differentiability r for certain functions of the index W along with the appropri- ate choice of the kernel function and the bandwidth sequence imply a faster rate of convergence in distribution for fin Specifically we choose a (r + 1)th order bias-reducing kernel which by Assumption R7(e) is required to be negative in part of its domain

The next lemma establishes the asymptotic properties of the infeasible esti- mator p

LEMMA 1 Let Assumptions R1-R8 hold Define

Z x x = f w ( 0 ) E ( A x A x I W = O )

I=fW(O)E(Axr Ax Au21 W = o ) K ( ~ ) ~dv

where g(r)(0) is the (k x 1) uector of rth-order deriuatiues of

eualuated at W = 0 Then P

(a) Sxx-+ Zxx (b) If K h k f + with 0 5 I lt o then (i) KsN(0 Z) and (ii)

P -K s x h ZxA -+

(c) If K h + + m then (i) h(r+)Sy -+ P

0 and (ii) h(+)S - P

ZxA

The asymptotic properties of fin easily follow from the previous Lemma If K h + I then K(fin - 3) N ( A ZX~~ ZxXx Z) while if

PK h + -+ m then h i i r f I(fin - IzIx+

In order to derive the asymptotic properties of the feasible estimator f i n we will make the following additional assumptions

ASSUMPTIONR9 In addition to the conditions of Assumption R7 the kernel function satisfies (a) K ( v ) is three times continuously differentiable with bounded deriuatiues and (b) IKr(vgtldv lIK(v)l dv l ~ ~ K ( v ) ~ d v and ~ v ~ K ( v ) ~ ~ v are finite

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 13: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1346 EKATERINI KYRIAZIDOU

The conditions of Assumption R9 are satisfied for example for K(v)being the standard normal density function which is a second order kernel

ASSUMPTIONR10 xT 87 and w have bounded 8 + 46 moments conditional on W for some 6 E (0 1) In addition E(Axl A u Awj 1 W) and E(AX Au Awj Awm IW) are continuous at W = 0 for all 1 = 1 k and j m =

1 q

ASSUMPTION in the selection equation lies in a R11 The parameter vector y compact1 set and i is a consistent estimator that satisfies qn - y = Op(npP) where 25 lt p I 12

For example p = 12 if y is estimated by maximizing the conditional likelihood function

ASSUMPTION = - 2p lt ltp2R12 h h K P where 0 lt h lt m and 1

Assumption R12 is crucial for establishing the result that follows This result states that i x x i and S have the same probability limits as their infeasible counterparts SK S and SK provided that the bandwidth sequence h is chosen appropriately for any given rate of convergence of the first-step estima- tor that is for any given p and for any degree of smoothness r

LEMMA2 Let Assumptions R1-R12 hold Then (a) i- Sk = op(l) (b) If K h + -+ h with 0 I h lt m then (i) K ( amp - S) = op(l) and (ii)

K ( i X A- = oP(l) (c) If K h + + athen (i) hi+)($Ku - Sxu)= op(l) and (ii) h(+)($

-sKgtop(lgt=

Lemma 2 readily implies that if K h -+ h then a ( b - 6) = op(l) A -

while if K h + + x then h + I(P - P) = op(l) Since ( - P ) = ( b -6) + ( 6 - p) we have the following theorem

THEOREM1 Let Assumptions R1-R12 hold (a) If K h + l - + h with 0 ~h lt m then fi(amp - P I 2~(hZx~

xxpxxxk 1 (b) If fib+ -+ x then h i i r+ ) ( fin - p -+

P ZZXA

11 Compactness of the parameter space is required for consistency of both Manskis estimator and the smoothed conditional maximum score estimator while it is not required for the conditional maximum likelihood estimator Notice though that since y can only be estimated up to scale we can always normalize it so that it lies on the unit circle Thus the compactness assumption is not restrictive

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 14: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL 1347

Thus in the limit the fact we are using Tit to estimate P does not affect the asymptotic distribution of Bf The lower bound on p imposed by Assumption R12 is the key for this result to hold In words this bound implies that B is estimated at a rate slower than y Indeed from Theorem 1 the rate of convergence of fin is (nh)- n - I gt - ~ 2 which is obviously slower than n-P since p gt 1 - 2p Thus in effect Assumption R12 requires that f i ( -y ) = o(l)

In principle we could allow P to be estimated at the same rate as y Thus if K ( g - y ) = OP(l)for K h -+ h we obtain the following asymptotic representation which may be easily derived from the analysis of Lemma 2(b) in the Appendix

where

n

0= ( l h ~ ) ~ ( ~ i h )plim (ln) Ax Awi Ahi Qi i = 1

provided that E(dx l A W ~ ~ I W ) at W = O and vK(v) -+O asis continuous lvl -f m Asymptotic normality of fir may still be established if K i q - y ) has an asymptotic representation of the form Jnh (T i J - y ) = l

K c ~ ( A ~ Ad y ) + 0(1)~ At first glance it looks attractive to eliminate the asymptotic bias of fin by

choosing h so that a h + + = 0 or equivalently by setting p gt (1(2(r+ 1)+ 1))In that casehowever the rate of convergence of fin is lower than when

gt 0 Indeed the rate of convergence in distribution of fin is maximized by making p as small as possible that is by setting p = 1(2(r + 1)+ I) in which Case it becomes - I + 1 ) ( 2 ( + 1 ) - 11 Thus for r large enough the estimator converges at a rate that can be arbitrarily close to n- lt provided also that y is estimated fast enough that is provided y gt ( r + 1)(2(r+ 1)+ 1)

Although the proposed estimator is asymptotically biased it is possible to eliminate the asymptotic bias while maintaining the maximal rate of conver-gence in the manner suggested by Bierens (1987)

COROLLARYLet 6be the estimator with window width h = h n - ( ( I I + I)+

and fin the estimator with window width h a = h n + Iwhere 6 E (0l)

12 We can also derive an asymptotic representation for i is estimated atin thc case where y

rate n- that is slower than 1 6In this case we obtain r z P ( in- 3) = XxlflnP(i- y ) + op(l) which implies that inconverges at the same rate as iwhich is slower than thc optimal rate obtained for the infeasible estimator f inthat is when y is known

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 15: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1348 EKATERINI KYRIAZIDOU

Define (I -6)( r+ 1)(2(r+ I ) + 1)A

A f in+ P a sp sz 1- n - ( l - 6 ) ( r + l ) ( 2 ( r A l ) + 1)

A

Then n(r+1(2(T fin- p ) 2)N(0 h- 12X12Xc

A In order to compute iin an application one needs to choose theor p

kernel function K and to assign a numerical value to the bandwidth parameter h Results on kernel density and regression function estimation suggest that the asymptotic performance of the estimator will be likely more sensitive to the choice of the window width than to the choice of the kernel Furthermore the asymptotic normality result of the Corollary above shows that the variance of the limiting distribution depends crucially on the choice of the constant h We will thus focus here on the problem of bandwidth selection Bierens (1987) discusses the construction of high order bias-reducing kernels

For a given order of differentiability r and a given sample size n the results of Theorem 1 suggest that h = h n -+ be chosen so that p = 1(2(r + 1) + 1) So the problem of bandwidth selection reduces to the problem of choosing the constant h A natural way to proceed (see Horowitz (1992) and Hardle (1990)) is to choose h so as to minimize some kind of measure of the distance of the estimator from the true value based on the asymptotic result of Theorem 1 Consider for example minimizing the asymptotic mean squared error of the estimator defined as

-- - 2 + t r a c e [ X ( + hX+ )xx)x]X C

for any nonstochastic positive semidefinite matrix A that satisfies 2~_CXX~~ZZ 0 It is straightforward to show that MSE is minimized by setting

trace [ 21A 22] 1(2(17 1 ) t 1 )

(321) h = h = 2 ( r + I )ZE ~A~~~~~

This last expression suggests that we may construct a consistent estimate of h if consistent estimtes of XI Z and 2 are available By part (a) of Lemmata 1and 2 S consistently estimates S for any h that satisfies h -jr 0 and nh +m In the next theorem we provide consistent estimators of S and 22A

THEOREM2 Assume that Assumptions Rl-R12 hold (a) Let fii2be a con-sistent estimator of p based on h =h n-1(2(1+1 and define =jJ-x P

13 The proof of Theorem 2 IS omitted herc to conserve space It is available at the authors world wide web page

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 16: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL

Then

(b) Let h = h n-o(2(r)+1) where 0 lt 6 lt 1 Then for g defined as in part (a)

Returning to our discussion about the construction of the estimator of P in practice we propose the following method (see also Horowitz (1992)) In the first stage for a given r and n choose any h = and any l ~ n - ( ~ ( ) + ~ ) hn 8 -- h n-8(2(1 1 1 with h an arbitrary positive constant and 0 lt S lt 1+

Compute fin based on h and construct g as defined in Theorem 2 Use 6 to compute^ the estimates of Z2 Zx and Z as discussed above Then estimate h by h using equation (321) with Cx1 C and C replaced by their consistent estimates In the second stage compute the asymptotic bias-corrected estimates as in the Corollary using as the constant in the definition of h and A8

This two-stage procedure is similar to the plug-in method used in kernel density and regression function estimation and it shares the same disadvan- tages First it involves the choice of a smoothing parameter in the first stage namely choosing the initial constant h Second by specifying the order of differentiability r the researcher is restricted to a certain smoothness class

It is interesting to note that standard statistical software may be used for computing estimates for the main equation and their standard errors Given a consistent estimate Tn for the selection equation and a bandwidth h = h n-1(2(1+) run OLS regression of I = JK(AW ~ h ) Ayi QL on XI

4-= Ax and compute the (asymptotically biased) estimate fin Standard errors are obtained from the Eicker-White covariance matrix

using the residuals from the regression ti= -gifi The bias-corrected esti-

mate fin is obtained as a linear combination of fi and fin as described in the Corollary of Theorem 1where fin comes from the auxiliary OLS regression of

+ II on X with bandwidth h = h We next turn to the problem of estimating the unknown parameter vector y

in the selection equation As we established the asymptotic results obtained for the proposed estimator of 3 depend crucially on the rate of convergence of the first-step estimator of y In particular it is straightforward to establish con-

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 17: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1350 EKATERINI KYRIAZIDOU

sistencylf 6if h( - y) = op(l) for any h that satisfies Assumption R8 ie for h -0 and nh -t m 011the other hand the asymptotic normality result of Theorem 1 requires that K(iy) =op(l) for any h that satisfies-

K 1 2 ~ + -amp with 0 I6 lt m The conditions for obtaining consistency and asymptotic normality of P are

satisfied by the conditional maximum likelihood estimator proposed by Rasch (1960 1961) and Andersen (1970) which is consistent and root-n asymptotically normal under the assumption that the errors in the selection equation are white noise with a logistic distribution and independent of the regressors and the individual effects In fact as Chamberlain (1992) has shown if the support of the predictor variables in the selection equation is bounded then identification of y is possible only in the logistic case Furthermore even if the support is unbounded in which case y may be identified and thus consistently estimated consistent estimation at rate n-7 is possible only in the logistic case As is well known though if the distribution of the errors is misspecified the conditional maximum likelihood approach will in general produce inconsistent estimators

Another possible choice for estimating y is the conditional maximum score estimator proposed by Manski (1987) Under fairly weak distributional assump- tions this estimator consistently estimates y up to scale However the results of Cavanagh (1987) and Kim and Pollard (1990) for the maximum score estimator proposed by Manski (1975 1985) for the cross section binary response model namely that it converges at the slow rate of nP l3 to a non-normal random variable suggest that these properties carry through to its panel data analog the conditional maximum score estimator Thus if (- y) = 0(nP13) it is possi- ble to consistently estimate B by choosing h to satisfy nl3h -m In this case though the analysis for obtaining the asymptotic distribution for p is not applicable

It is possible however to modify Manskis conditional maximum score estima- tor and obtain control over both its rate of convergence and its limiting distribution by imposing sufficient smoothness on the distribution of the errors and the explanatory variables in the selection equation Specifically following the approach taken by Horowitz (1992) for estimating the cross section binary response model we can construct a smoothed conditional maximum score estimator which under weak (but stronger than Manskis) assumptions is consistent and asymptoticaly normally distributed with a rate of convergence that can be arbitrarily close to n-I2 depending on the amount of smoothness

14 Consistency of p may be established under the weaker restriction that zllF - yll = o(l) The proof of Lemma 2(a) would then have to be modified by taking a third instead of a first order Taylor series expansion This modification does not alter the basic restriction for obtaining an asymptotic distribution for 6which does not depend on the estimation of y in the first step namely that y has to be estimated at a faster rate than p Notice that in this case the upper bound on u in Assumption R12 would have to be replaced by ( 6p- 1)7 However this modification would affect the proof of Theorem 2 which would become unnecessarily complicated and long

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 18: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1351 SAMPLE SELECTION MODEL

we are willing to assume for the underlying distributions This estimator is considered in an earlier version of the paper (Kyriazidou (1994)) and also in Charlier et al (1995)

4 MONTE CARL0 EVIDENCE

In this section we illustrate certain finite sample properties of the proposed estimator The Monte Carlo results presented here are in no sense representa- tive of the estimators sampling behavior since only one experimental design is considered Further there is little justification for the choice of the particular design except that it is simple to set up and that in the absence of sample selectivity ordinary least squares on the first differences would perform quite well The simulation study of this section is intended more as an investigation of the sensitivity of the estimator to the choice of bandwidth the order of the kernel the proposed asymptotic bias correction the first step estimation method the performance in practice of the proposed plug-in method for estimating the bandwidth constant and finally the practical usefulness of the proposed covari- ance matrix estimator in testing hypotheses about the main regression equation coefficients

Data for the Monte Carlo experiments are generated according to the model

where p O = 1 y = y = 1 w and w2 are independent N( -1 l ) variables q = (w + w)2 + 25 with 5 an independent variable distributed uni- formly over the interval (01) u is logistically distributed normalized to have variance equal to 1 x= w a = + w )2 + 5 with 5 an indepen- (w dent N(0 2) variable and s = 08t3 + 06ul with 5 an independent standard normal variable All data are generated iid across individuals and over time This design implies that Pr(d + d = 1)= 037 and Pr(d = d = 1) = 031 so that approximately 37 percent of each sample is used in the first step estimation of the selection equation and approximately 31 percent in the second step Each Monte Carlo experiment is performed 1000 times while the same pseudoran- dom number sequences are used for each one of three different sample sizes n 250 1000 and 4000

Table I presents the finite sample properties of the naive estimator denoted by p that ignores sample selectivity and is therefore inconsistent This estimator is obtained by applying OLS on the first differences using only those individuals that are selected into the sample both time periods ie those that have d = d= 1This estimator may be viewed as a limiting case of our proposed estimator with bandwidth equal to infinity Panel A reports the estimated mean bias and root mean squared error (RMSE) for this estimator over 1000 replications for different sample sizes n As the estimator may not have a finite mean or variance in any finite sample we also report its median

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 19: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

EKATERINI KYRIAZIDOU

TABLE I

Panel A Finite Sample Properties of bNAIVL Mean Median Bias Bias RMSE MAD

Panel B Sizes of i tests 001 005 010 020

bias and the median absolute deviation (MAD) Panel B reports the number of rejections of the null hypothesis that B is equal to its true value BO= 1at the 1 5 10 and 20 percent significance levels Both panels confirm that the estimator is inconsistent

Table I1 presents the finite sample properties of the proposed two-step estimator The left-hand-side panels are for 6 obtained by specifying r = 1and using K(v)= +(u) where 4 is the density of the standard normal distribution

TABLE I1

FINI~ESAMPLEPROPERTIES AND in - I 5 K ( v )= 4 ( ~ )OF j h=

b j(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

hlean Median Mean Median Bias Bias RMSE MAD Bias Blas RMSE MAD

P a n e l A True y 02427 01625 00018 01368 0 0924 00078 00792 00511 00024

Panel B qL 02076 01438 00145 01169 00778 00117 00672 00455 0 0059

P a n e l C 02592 01725 -00021 01435 00950 -00026 00826 00544 -00005

P a n e l D cws4 01780 01255 00327 01063 00703 00106 00629 00410 -00139

P a n e l E qscnlsr 01765 01242 00361 01071 00721 00146 00659 00416 -00098

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 20: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL 1353

which is a second order bias-reducing kernel The bandwidth sequence is h =h n-1(2r++1=h n- lI5 with h = 1 The panels on the right-hand side present the results for f i n the estimator of the Corollary of Theorem 1which corrects for asymptotic bias where we use 6 = 01 Going from top to bottom of Table 11 Panel A reports the results for the proposed estimator using the true y in the construction of the kernel weights15 In Panel B y is estimated by conditional logit denoted by qLwhich in this case will be consistent since all of the assumptions underlying the approach hold in our Monte Carlo design In Panel C y is estimated using the conditional maximum score estimatorl6 denoted by qcry and in Panels D and E we use the smoothed conditional maximum score estimator denoted by q In Panel D y is estimated at a rate faster than p while in Panel E both and y are estimated at the same rate

From Table I1 we see that the propose estimator is less biased than the naive OLS estimator both with and without the asymptotic bias correction Furthermore this bias decreases with sample size since the estimator is consis- tent at rate slower than n - I 2 as predicted by the asymptotic theory This may be seen by the fact that the RMSE decreases by less than half when we quadruple the sample size Notice that the results do not change substantially whether we use the true y or we estimate it for the construction of the kernel weights except when the smoothed maximum score approach is used In the latter case (Panels D and E) the estimator is significantly more biased although its RMSE is lower than in the other panels This may be due to the relatively large finite sample bias of the smoothed maximum score estimates (see also Horc3witz (1992)) which may be thought of as increasing the effective window

15 In the construction of the kernel weights of both the infeasible estimator j of Panel A and the feasible estimators of Panels B-E the norm of y is set equal to one so that the results across panels are comparable

The CMS estimates are computed by maximizing the objective function (ln)C_Ad ~ A w gs + Awt2g22 0) (see also equation (7) in Manski (1987)) over g = sin(g) and g2 = cos(g) with g ranging in a 2000-point equispaced grid from 0 to 27r

17 The SCMS estimates are computed by maximizing

over all g E hat have g = 1and gl in a compact subset of It by the method of fast simulated annealing Joel Horowitz kindly provided the optimization routine In Panel D we set L ( v )=Kj(v) of Horowitz (1992 page 5161 which implies that the estimator denoted by Tsctfsa converges in distribution at rate 1-49 (faster than the rate of P which in the case of a second order kernel is n-25) so that the asynlptotic theory of Section 31 is valid hl Panel E we use Liv) = iv) where is the standard normal cumtllative distribution function In this case the estimator denoted by +sFSCZfS2r converges in distribution at the same rate as P n- j The SCMS estimates used in the construction of the kernel weights are corrected for asymptotic bias using 6 = 01 and are obtained by the two stage plug-in procedure where in the first stage the bandwidth sequence is cr = 05~-(1fih~(in= 2 or 41 while the second stage uses the estimated optimal constant in the 1)

construction of the bandwidth For details see Horowitz (1992) and Kyriazidou (1994)

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 21: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1354 EKATERINI KYRIAZIDOU

width used in the estimation of P Furthermore we notice that the results are very similar when y is estimated at the same rate as p (Panel E) relative to the case where it is estimated faster than p (Panel D) Comparing the right and left sides of Table 11 we see that the asymptotic bias correction does decrease the estimated (mean and median) bias of the estimator it invariably however increases its variability

In Table I11 we investigate the sensitivity of the (infeasible) estimator with respect to the choice of the bandwidth constant and the choice of the kernel

A function Panels A and B present the results for b and P using a bandwidth constant h equal to 05 and 3 respectively and a second order bias-reducing kernel As expected the estimators bias increases as we increase the bandwidth while the RMSE decreases The increase in both mean and median bias appears quite large which indicates that point estimates may be quite sensitive to the choice of bandwidth In order to give a sense of the precision with which these biases are estimated we provide at the bottom of Table I11 their estimated standard errors for the two sets of experiments that use 05 and 3 as bandwidth constant (Panels A and B) ~

In Panels C and D we use a fourth and a sixth order bias-reducing kernel19 and set h =n-1(2(+l)) with r = 3 and r = 5 respectively A comparison of Panels 11-A and 111-C and 111-D suggests that the use of higher order kernels speeds up the rate of convergence of the estimator although there does not appear to be much gain from increasing the order of the kernel from four to six

Table IV explores the properties of the proposed estimator when the plug-in method described in Section 32 is used The specification is the same as in Table 11 Comparing Panels A-D in Tables I1 and IV we see that the bias of the estimates increases when the optimal bandwidth constant 6 is used yhile their RMSE decreases (except in Panel IV-Dl This is because in general h is larger than the initial constant (here the initial bandwidth constant is set equal to one2) Table V displays the mean of 6 across 1000 replications for different specifications of the initial constant for the case of the infeasible estimator We find that the means of the estimates are increasing in the initial bandwidth constant (although this is not necessarily true for all 1000 samples) Our finding may be interpreted by the asymptotic bias term being in general poorly esti- mated in the particular Monte Carlo design used in this study Indeed we find that for the sample sizes considered here the estimated asymptotic bias of the estimator decreases with the bandwidth constant h contrary to the asymptotic

l8 To estimate the standard errors for the median bias we need to calculate the estimators density This is estimated using a normal kernel and the rule-of-thumb bandwidth suggested by Silverman (1986 equation 328)

19 The fourth-order kernel is K(v) = l l e x p ( - ~ ~ 2 )- ~ l e x p ( - c 2 2 1 1 ) ( 1 m ) and the sixth-order kernel K(v) = 15 e ~ ~ ( - ~ ~ 2 ) - 06 exp(-u22 is + 01 exp(-u22 9)(l 6) 4)(1 4)See Bierens (1987)

20 We chose the initial h equal to one as the mean squared error of the distribution of the (infeasible) estimator in the 1000 replications was found to be minimized in that neighborhood when a rough search over a 10-point grid from 05 to 10 was performed for a sample size n = 100000

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 22: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

- -- -

1355 SAMPLE SELECTION MODEL

TABLE I11

FINITESAMPTEPROPERTIES TRUE dOF b AND

i it(Without Asymptot~c Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A K(v)= 4(v)h= 0 5 n 1 00040 03463 02140 -00017 00065 00064 01930 01308 00053 00023 00002 01119 00752 -00005 -00014

Panel B ~ ( v )= 4(v)h = 3 n 1 1 5 00631 01550 01097 00542 00566 00459 00933 00626 00435 00426 00351 00565 00418 00316 00321

Panel C K(v)= h n 1 l 9Kj(v) =

00246 01966 01390 00080 00121 00159 01067 00723 00099 00003 00159 00582 00397 00051 00054

P a n e l D K(v)= K(v)h = n113 00269 01973 01362 00002 00030 00144 01041 00719 00032 -00031 00170 00560 00391 -00006 -00002

a The estimated standard errors of the mean bias estimates for n = 250 1000 and 4000 are 00110 00061 00035 for Panel A and 00045 00026 and 00014 for Panel B respectively

The estimated standard errors of the median hias estimates for IZ = 250 1000 and 4000 are 00136 00077 and 00044 for Panel A and 00059 00033 and 00018 for Panel B respectively

TABLE IV

FINITESAMPLEPROPERTIES = h = 1K( v) = 4(v)OF bn AND b h amp n - I 5 INITIAL

a A(Without Asymptotic Bias Correction) (With Asymptotic Bias Correction)

Mean Median Mean Median Bias Bias RMSE MAD Bias Bias RMSE MAD

P a n e l A True y 01919 01287 00261 01053 00700 00330 00653 00507 00273

Panel B TL 01703 01191 00454 01000 00693 00465 00654 00504 00385

Panel C TcMs 02117 01329 00221 01114 00718 00246 00671 00507 00246

D S C M S ~ 01543 01086 00705 01004 00740 00604 00658 00488 00401

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 23: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

EKATERINI KYRIAZIDOU

TABLE V

Iilitial Initial Initial Initial h = 05 h = l 11 = 2 h = 3

result of Theorem 1It thus appears that for the particular design small sample bias is more important than asymptotic bias The sensitivity of the optimal constant estimate A to the choice of the initial constant suggests that further research on alternative methods for choosing the bandwidth may be warranted

We next investigate whether normality might be a good approximation to the finite sample distribution of the proposed estimator In Figure 1 we plot the quantiles of b against those of a normal random variable with the same mean and variance as the sample mean and sample variance of p Such quantile- quantile plots are provided for different sample sizes and for the true and the

True y

5

0 5 1 15 2 0 5 1 1 5 2 0 5 1 15 2 Flg l a Fig 1b Fig l c

Note Figures la Id lg n = 250 Figures lb le lh n = 1000 Figures lc If li tl = 4000

FIGURE 1-Quantile-quantile plots of inagainst a Normal h = n-~(v) = $(v)

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 24: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1357 SAMPLE SELECTION MODEL

estimated values of y using the specification of Table I1 (that is using a second order kernel and h =n-I5) We find that for the experimental design used in this study the small sample distribution of the proposed estimator is well approximated by a normal distribution The plots for the asymptotic bias-cor- rected estimator are very similar albeit displaying a larger dispersion and are not given here

Finally we examine the size of t tests where the test statistics use the asymptotic covariance matrix estimator proposed in Theorem 2 Specifically in Table VI we test the null hypothesis that P isAequal to its true value P o= 1 To this end we construct t statistics for 1 and 1 for the specification of Table I1 (that is using a second order kernel and h =n- I5) Standard errors are constructed using the estimator given by equation (322) The table presents the fraction of samples for which the null hypothesis is rejected at the 1 5 10 and 20 percent statistical significance level We find that the actual levels of the tests are not far from the nominal levels especially for larger sample sizes and that they are closer for the estimates without the asymptotic bias correction Note that although we report the results of the t tests for bn using Manskis CMS estimator in the first step (Panel VI-C) the standard errors calculated for the two-step estimator of the main equation are only heuristic since as discussed in

R Section 32 the asymptotic normality of fin (and P) does not obtain in this case due to the slow rate of convergence of yc However the levels of the tests even in this case are reasonable Alternatively we could have used bootstrap standard errors

TABLE VI

SIZEOF t TESTSUSINGfin AND b h = n- K ( u )= 4 ( u )

b k(Without Anymptotic Bias Correction) (With Asymptotic Bias Correction)

001 005 010 020 001 005 010 020

Panel A True y 01610 02530 00590 01240 02180 00260 01120 02260 00210

Panel B TL 01580 02680 00450 01160 02140 00230 01140 02250 00180

Panel C Scnfs 01600 02720 00610 01170 02160 00350 01180 02390 00240

Panel D SScMS 01430 02570 00280 01220 02250 00190 01230 02430 00250

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 25: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1358 EKATERINI KYRIAZIDOU

5 CONCLUSIONS

This paper proposed estimators for a sample selection model from panel data with individual-specific effects We developed a two-step estimation procedure for the parameters of the regression equation of interest which exploits a conditional exchangeability assumption on the errors to difference out both the unobservable individual effect and the sample selection effect in a manner similar to the fixed-effects approach taken in linear panel data models The Monte Carlo results indicate that the estimator may work well in practice with sufficiently large data sets However it is quite sensitive to the choice of the bandwidth parameter which suggests that further research on this issue may be warranted Two more issues will be also left for future investigation

First notice that the exchangeability assumption (Assumption R1) underlying the proposed estimator implies a conditional symmetry restriction for the first-differenced errors of the main equation which could be used to develop a Least Absolute Deviations-type estimator This estimator might then be com- bined optimally with the Least-Squares-type estimator proposed in this paper for efficiency considerations Furthermore LAD estimators might be preferable in the case of heavy-tailed distributions but they do not have closed-form solutions and their asymptotic properties are more difficult to derive

Second although the analysis rested on the strict exogeneity of the explana- tory variables in both equations it is possible to allow for lagged endogenous variables in the set of regressors Honor6 and Kyriazidou (1997) propose estimators for discrete choice panel data models with exogenous regressors individual effects and lags of the dependent discrete variable Kyriazidou (1997) proposes estimators for dynamic sample selection models where the latent equations contain strictly exogenous regressors individual effects and lags of the dependent endogenous variables

Department of Economics Uniuersity of Chicago 1126 E 59th St Chicago Illinois 60637 U SA

Maizuscrrpt receiced May 1994 final reL ision receiced January 199

APPENDIX

The proofs of the results in the main text make use of the following two lemmas which maintain Assumptions R4 and R8 of Section 3

LEMMAAl Let S = is a random sam- (ln)Z=l ( l h )L (M h )Z v s 2 0 where (Z y)]= ple from a disirrbuiron that has E ( I Z I ~ I for L ~at~sfiesW )ltM lt almost all W and the functron

P ~ l v(v ) l d v lt M Then E ( S ) = O ( k i ) and var(S) = O(hnh) Tlzus for s 2 1 S + 0 while for

P s = 0 S +f(O)E(ZI W = O)lL(v)d v procrded that E ( Z I W ) rs contrnuo~ts at W = 0

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 26: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL

PROOF Random sampling implies that

Under our assumptions and by bounded convergence we obtain

The stated probability limits then obtain by Chebyshevs theorem

LEMMAA2 (Liapounov CLT for doublc arrays) Let = (1 l t l )~= I tiwhere an Independent sequence of scalar random ~arrables that satis$es E( (I0 var( (I lt rn var(= +

V lt aand I3= El ( 61 +0 for some 8 E (01) as n + Then Jizh~N(0 V)

PROOF See Theorem 712 and comment on pagc 209 in Chung (1973)

COROLLARY = where (Z U)l= 1s a random sample from a Al Let ( ( I amp)L(w~)z d~stnbutlonsuch that E(ZI W) = 0 and E(IZI 1 W) lt M lt w for almost all W E(Z2 I W) IS conhnuous at W = 0 and the functlon L satrsfies llL(v)l dv lt 53 Then KS= ( l ix)~l=amp N(0

f W ( 0 ) ~ ( Z 2 I ~ =O ) ~ L ( V ) ~ ~ V )

PROOFOF LEMMA1 (a) Apply Lemma A1 with 2= Ax Ax di (1 j = I k) s = 0 and L(v ) = K(v)

(b-i) Apply Lemma A2 with tt= c1(1 amp)K(Uh) Ax Ac where c is a k X 1 vector of constants such that cc = 1

(b-ii) Note that by Assumption R5 Ah = AWThus wc may write

S = Ax( 1 ~ 1 ) I 3 ~ = ~ ( l h ) K ( H ( h )

Therefore E(S) = l(lh)K(Wh)Wg(W) dW where g(W) - E(Axr AlW)fw(W) is by assumption r times colltinuously differentiable with derivatives that are bounded on the support of W and has g(0) lt m A Taylor series expansion of g() around 0 and a change of variables W = vh lead to

17

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 27: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1

1360 EKATERINI KYRIAZIDOU

for some c i lying between 0 and W since jvlK(v) dv = 0 for j = 1 r Therefore by bounded convergence

since under our assumptions I vlr+ K( v) dv lt aand by assumption K h + +amp Furthermore

by Lemma Al var(SxA) = ~ ( h i n h ) which ~mplies that var(Jnh~ ) = O(nh)O(hn) = hi) P -

= dl) Hencc Ks-) hXX

(c-i) Note that

while by Lemma Al var(S = O((nh)-1 Therefore E(h(+ )S ) = 0 and var(h (+ )Sx)=

~ ( h ~ ( + ) Since by assumption K h + as n +a (nh)-) = ~ ( ( i z h ~ ( ~ + ) + ) - )=o i l )

Thus h~ + 0 (c-ii) From part (b-ii) above

and

s~nce nh(+ )+ + implies that nh ++a Thus h(+ S rA P z~~+

REMARKSii) In what follows A4 stands for a generic constant which is the uppcr bound of certain quantities

(ii) We define the matrix norm IIAll= dtrace(AA) (iii) In the Taylor series expansions c stands for a generic value between U and

PROOFOF LEMMA2 (a) By a Taylor series expansion we can write

Therefore

since by assumption p ltp2 IK1(v)l lt m and E(llAwIlll~x11~) lt a

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 28: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL 1361

(b-i) Let $itand s dcnote the Ith (I = 1 k ) elements of fxand S respectively A third order Taylor series expansion yields

$m$l-s)

1 1+ liiz -- K AX d~~ (div(Tn - y113hj 6n i =

We will show that A and A are 0(1) while A = o(l) The desired result will then follow from the fact that p lt p 2 implies that h i 1 ( - y ) = Op(niL-1= o(l)

Let A be the jth element ( j = 1 q ) of the (1 X q ) vector A Write A t= l vz~= where t= ( I f i )K(ampltlz ) AX Ami Aw) Note that (Jz= is a sequence of scalar random variables that satisfies the requirements of Lemma A since under our assumptions ~ ( l d x d w ~ l r ~ ~ for almost all W while lK(v)l lt w and l IKf (v) l d v lt imply that W )lt j 1 K(v)12d v lt m Therefore A is bounded in probability

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix A is also bounded in probability by defining c AX dc d w i Aw= ( l V K ) ~ ( ~ h ) since ~ ( i W )lt m for almost all Wand the boundedness and absolute integrability of As Awl Awn Aci2+ 1 K ( v ) implies that l j ~ ( v ) ~ ~ b vlt a

Next obscrve that since p gt 2 5 and u lt p 2 imply that (1 2 ) + ( 7 ~ 1 2 )- 3p lt 0

1 1 llA311S M ~ L - I I ~- I lA~~ l l yl13- l l A w l 1 ~ 1 ~ ~ ~ ~ 1

hj2 r = l

(b-ii) Let f and S-L denote the lth (I = 1 k ) elements of $ and S respectively 4 third order Taylor series expansion yields

JlZh($ - S)

+ amp-1 -1 x K AX AA B ( A ~ ( - y i l 3

h7 6n =

We will show that Bl and B are 0(1) while B3 = o(l) Thc desired result will thenfollow from the fact that 1 - 2 p lt u lt p 2 implies that hi1(- y ) = Op(nL-) = o(l) and - y ) = o(n -~-) = o(l)

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 29: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1362 EKATERINI KYRIAZIDOU

Note that Bl is a ( I x q ) row-vector For its jth element

application of Lemma A1 with s = = yields1 Z 3AX A Awj and ~ ( v )~ ( v )

1 E ( B f )= - O(h )= O(1) and

h

since E ( A ~ ~ W )lt a for almost all W and l v ~ ( v ) l ~A 2 ~ w j 2 d v lt a

Similarly we can show that the jmth element (j m = 1 q ) of the ( q X q ) matrix B

is also in since A 2 ~ ~ j 2 ~ ~ n 2 for allbounded probability E ( A X ~ W )lt a almost W and JIvK1 (v) ldvlta

Next observe that

since under assumptions (1 2 ) + ( 7 ~ 2 )- lt a3 p lt 0 y lies in a compact set and E(llAx1 I A W I ~ ) (c-i) Note that with h =h n - the condition nh(+)++a implies that p lt 1(2(r + 1)+ 1)

In what follows we will use the fact that for r r 1

Define f and s as before A third order Taylor series expansion yields

1 I n W 1 1+-ci-yi(r E n r f ( i i ) - ( - Y )2 4 n x j n a q aw nw

nhn = I id-n h hi+ h

1 1 1 1 1 - (Tn -y ) +-(+-ylA2 -(+ - Y ) + A 4

= h h 2 h 4a+

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 30: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

SAMPLE SELECTION MODEL 1363

where Ai and A are defined as in the proof of part (b-1) As we showed there both these quantities are bounded in probability for any h that satisfies h -O and nh -t 13 as n increases Furthermore from (1) above hi1( - y) = op(l) T ~ L I S OP(nF-) = the first two terms of the sum above are o(l) Now by (21

(c-ii) Lct $ and Sf be defined as before A third order Taylor series evpansion yiclds

where Bi and B2 are defined as in the proof of part (b-ii) and as we showed there they arc houndcd in probability for any I that satisfies nh + 13 as n increases Thus the first two terms of the sum above are o(l) Furthermore

REFERENCES

AHNH AND J L POWELL (1993) Semiparametric Estimation of Censorcd Selection Models with a Nonparamctric Selection Mechanism Journal of Econometrics 58 3-29

AMEMIYAT (1985) Aduancetl Econometrics Cambridge Harvard University Prcss ANDERSEWE (1970) Asymptotic Properties of Conditional Maximum Likelihood Estimators

Jortrrzal of the Royal Statistical Sociely Series B 32 283-301 BIERENSH J (1987) Kernel Estimators of Regression Functions in Advaaces in Ecor~omefrics

Fifih World Congress Vol 1 ed by T F Bewley Cambridge Cambridge University Prcss CAVANAGHC L (1987) Limiting Behavior of Estimators Defined by Optimization unpublished

manuscript CHAMBERLAING (1984) Panel Data Handbook of Econometrics Volume 11 edited by Z

Griliches and M Intriligator Amsterdam North-Holland Ch 22 -(1992) Binary Response Models for Panel Data Identification and Information unpub-

lished manuscript Department of Econon~ics Haward University CHARLIER AND A H 0 VANE B MELENBERG SOEST (1995) A Smoothed Maximum Score

Estimator for the Binary Choice Panel Data Model with an Application to Labour Force Participation Sfatistica fiderlandica 49 324-342

CHUNGK L (1974) A Course in Probabilily Theoqi New York Academic Press GRONAUR (1974) Wage Comparisons-A Selectivity Bias Joztrnal of Political Eco~zorrzy 82

1110-1144

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 31: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

1364 EKATERINI KYRIAZIDOU

HARDLE W (1990) Applied Nonparametric Regression Cambridge Cambridge University Press HAUSMANJ A AND D WISE (1979) Attrition Bias in Experimental and Panel Data The Gary

Income Maintenance Experiment Econometrica 47 455-473 HECKMANJ J (1974) Shadow Prices Market Wages and Labor Supply Econornetrica 42

679-694 -(1976) The Common Structure of Statistical Models of Truncation Sample Selection and

Limited Dependent Variables and a Simple Estimator for Such Models Annals of Economic and Social Measurement 15 475-492 -(1979) Sample Selection Bias as a Specification Error Econometrica 47 153-161 HONOR^ B E (1992) Trimmed LAD and Least Squares Estimation of Truncated and Censored

Regression Models with Fixed Effects Econometrica 60 533-565 -(1993) Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Depen-

dent Variables Journal of Econometrics 59 35-61 HONOR^ B E AND E KYRIAZIDOU (1997) Panel Data Discrete Choice Models with Lagged

Dependent Variables unpublished manuscript HOROWITZJ (1992) A Smoothed Maximum Score Estimator for the Binary Response Model

Econornetrica 60 505-531 HSIAO C (1986) Analysis of Panel Data Cambridge Cambridge University Press KIM J AND D POLLARD (1990) Cube Root Asymptotics Annals of Statistics 18 191-219 KYRIAZIDOU of Panel Data Sample Selection Model unpublished E (1994) Estimation A

manuscript Northwestern University -(1997) Estimation of Dynamics Panel Data Sample Selection Models unpublished

manuscript University of Chicago MANSKIC (1975) Maximum Score Estimation of the Stochastic Utility Model of Choice Joumal

of Econometrics 3 205-228 -(1985) Semiparametric Analysis of Discrete Response Asymptotic Properties of Maximum

Score Estimation Journal of Econometrics 27 313-334 -(1987) Semiparametric Analysis of Random Effects Linear Models from Binary Panel

Data Econornetrica 55 357-362 NIJMANT AND M VERBEEK (1992) Nonresponse in Panel Data The Impact on Estimates of a

Life Cycle Consumption Function Journal ofApplied Econometrics 7 243-257 POWELLJ L (1987) Semiparametric Estimation of Bivariate Latent Variable Models Working

Paper No 8704 Social Systems Research Institute University of Wisconsin-Madison - (1994) Estimation of Semiparametric Models Handbook of Econometrics Vol 4

2444-2521 RASCH G (1960) Probabilistic Models for Some Intelligence and Attainment Tests Copenhagen

Denmarks Paedagogiske Institut -(1961) On General Laws and the Meaning of Measurement in Psychology Proceedings of

the Fourth Berkeley Symposium on Mathematical Statistics and Probability Vol 4 Berkeley and Los Angeles University of California Press

ROSHOLMM AND N SMITH (1994) The Danish Gender Wage Gap in the 1980s A Panel Data Study Working Paper 94-2 Center for Labour Market and Social Research University of Aarhus and Aarhus School of Business

SILVERMANB W (1986) Density Estimation for Statistics and Data Analysis New York Chapman and Hall

VERBEEKM AND T NIJMAN (1992) Testing for Selectivity Bias in Panel Data Models Intema-tional Economic Review 33 681-703

WOOLDRIDGEJ M (1995) Selection Corrections for Panel Data Models under Conditional Mean Independence Assumptions Journal of Econometrics 68 115-132

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 32: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

You have printed the following article

Estimation of a Panel Data Sample Selection ModelEkaterini KyriazidouEconometrica Vol 65 No 6 (Nov 1997) pp 1335-1364Stable URL

httplinksjstororgsicisici=0012-96822819971129653A63C13353AEOAPDS3E20CO3B2-B

This article references the following linked citations If you are trying to access articles from anoff-campus location you may be required to first logon via your library web site to access JSTOR Pleasevisit your librarys website or contact a librarian to learn about options for remote access to JSTOR

[Footnotes]

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

17 A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

References

Wage Comparisons--A Selectivity BiasReuben GronauThe Journal of Political Economy Vol 82 No 6 (Nov - Dec 1974) pp 1119-1143Stable URL

httplinksjstororgsicisici=0022-3808281974112F1229823A63C11193AWCSB3E20CO3B2-L

httpwwwjstororg

LINKED CITATIONS- Page 1 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 33: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

Attrition Bias in Experimental and Panel Data The Gary Income Maintenance ExperimentJerry A Hausman David A WiseEconometrica Vol 47 No 2 (Mar 1979) pp 455-473Stable URL

httplinksjstororgsicisici=0012-96822819790329473A23C4553AABIEAP3E20CO3B2-T

Shadow Prices Market Wages and Labor SupplyJames HeckmanEconometrica Vol 42 No 4 (Jul 1974) pp 679-694Stable URL

httplinksjstororgsicisici=0012-96822819740729423A43C6793ASPMWAL3E20CO3B2-S

Sample Selection Bias as a Specification ErrorJames J HeckmanEconometrica Vol 47 No 1 (Jan 1979) pp 153-161Stable URL

httplinksjstororgsicisici=0012-96822819790129473A13C1533ASSBAAS3E20CO3B2-J

Trimmed Lad and Least Squares Estimation of Truncated and Censored Regression Modelswith Fixed EffectsBo E HonoreacuteEconometrica Vol 60 No 3 (May 1992) pp 533-565Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5333ATLALSE3E20CO3B2-2

A Smoothed Maximum Score Estimator for the Binary Response ModelJoel L HorowitzEconometrica Vol 60 No 3 (May 1992) pp 505-531Stable URL

httplinksjstororgsicisici=0012-96822819920529603A33C5053AASMSEF3E20CO3B2-M

Cube Root AsymptoticsJeankyung Kim David PollardThe Annals of Statistics Vol 18 No 1 (Mar 1990) pp 191-219Stable URL

httplinksjstororgsicisici=0090-53642819900329183A13C1913ACRA3E20CO3B2-A

httpwwwjstororg

LINKED CITATIONS- Page 2 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list

Page 34: Estimation of a Panel Data Sample Selection Model ... · The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals

Semiparametric Analysis of Random Effects Linear Models from Binary Panel DataCharles F ManskiEconometrica Vol 55 No 2 (Mar 1987) pp 357-362Stable URL

httplinksjstororgsicisici=0012-96822819870329553A23C3573ASAOREL3E20CO3B2-H

Nonresponse in Panel Data The Impact on Estimates of a Life Cycle Consumption FunctionTheo Nijman Marno VerbeekJournal of Applied Econometrics Vol 7 No 3 (Jul - Sep 1992) pp 243-257Stable URL

httplinksjstororgsicisici=0883-7252281992072F092973A33C2433ANIPDTI3E20CO3B2-Y

Testing for Selectivity Bias in Panel Data ModelsMarno Verbeek Theo NijmanInternational Economic Review Vol 33 No 3 (Aug 1992) pp 681-703Stable URL

httplinksjstororgsicisici=0020-65982819920829333A33C6813ATFSBIP3E20CO3B2-Z

httpwwwjstororg

LINKED CITATIONS- Page 3 of 3 -

NOTE The reference numbering from the original has been maintained in this citation list