Chapter 16

28
Chapter 16 Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes

description

ECON 6002 Econometrics Memorial University of Newfoundland. Qualitative and Limited Dependent Variable Models. Chapter 16. Adapted from Vera Tabakova’s notes. Chapter 16: Qualitative and Limited Dependent Variable Models. 16.1 Models with Binary Dependent Variables - PowerPoint PPT Presentation

Transcript of Chapter 16

Page 1: Chapter  16

Chapter 16

Qualitative and Limited Dependent Variable Models

ECON 6002Econometrics Memorial University of Newfoundland

Adapted from Vera Tabakova’s notes

Page 2: Chapter  16

Chapter 16: Qualitative and Limited Dependent Variable Models

16.1 Models with Binary Dependent Variables

16.2 The Logit Model for Binary Choice

16.3 Multinomial Logit

16.4 Conditional Logit

16.5 Ordered Choice Models

16.6 Models for Count Data

16.7 Limited Dependent Variables: Heckman selection

model Slide 16-2Principles of Econometrics, 3rd Edition

Page 3: Chapter  16

16.7.6 Sample Selection

Problem: our sample is not a random sample. The data we observe

are “selected” by a systematic process for which we do not account

We wonder about the relationship between x and y but data are

available only for observations in which another variable, z*,

exceeds a certain value

Selection bias occurs when your sample is truncated and the cause of that truncation is correlated with the dependent variable

Slide16-3Principles of Econometrics, 3rd Edition

Page 4: Chapter  16

16.7.6 Heckit

Solution: a technique called Heckit, named after its developer, James

Heckman

Heckman was awarded the Nobel Prize in 2000 for this contribution:

“for his development of theory and methods for analyzing selective samples”.

although his contribution to economics is gigantic (he is considered one of the

ten most influential economists alive)

Who did he fly to Stockholm with?Slide16-4Principles of Econometrics, 3rd Edition

Page 5: Chapter  16

16.7.6 Heckit

Other examples of selection issues:

Will GRE scores help us screen MA applicants for 2014?

Does getting married before “shacking up” help keep marriages

off divorce proceedings?

Planes coming back from the war with bullet holes?

Slide16-5Principles of Econometrics, 3rd Edition

Page 6: Chapter  16

16.7.6 Sample Selection

For example, you go to George St. to collect data on the drinking habits of MUN students (very convenient but that is why convenience samples are not valid for general inference!)

To the extent that the likelihood of someone being there is somehow related to the number of drinks they are going to have you would have a sample selection issue

We want our sampling to be random or at least due to some exogenous sampling

Endogenous sampling leads to inconsistent and biased estimation

Slide16-6Principles of Econometrics, 3rd Edition

Page 7: Chapter  16

16.7.6 Sample Selection

Sample selection can arise in many settings and for different reasons, so there are many “sample selection models”

For example, selection into the sample may be due to self-selection, with the outcome of interest determined in part by individual choice of whether or not to participate in the activity of interest

That is why you want your census to be compulsory!

Slide16-7Principles of Econometrics, 3rd Edition

Page 8: Chapter  16

16.7.6 Sample Selection

The Tobit model can be considered a type of basic selection models too

More flexible extensions of Tobit are what most people refer to as sample selection models

A simple extension is to consider a bivariate sample selection model (as labelled by Cameron and Trivedi), which generalizes the Tobit model by introducing a censoring latent variable that differs from the latent variable generating the outcome of interest

Example: there needs to be something else other than the desired number of hours to supply prompting wives to go to work

Amemiya calls this model the Tobit II (while we already know about the Tobit I above)

Slide16-8Principles of Econometrics, 3rd Edition

Page 9: Chapter  16

16.7.6 Sample Selection

Consistent estimation the under sample selection on unobservables relieson quite strong distributional assumptions

Experimental would allow us to avoid selection problems by using random assignment to a treatment

However, experiments can be difficult to implement in economics applications for both cost and ethical reasons

The treatment effects approach attempts to apply the experimental approach to observational data (See Cameron and Trivedi MMA, Ch 25)

There is an increasing number of works dealing with this type of approach

Slide16-9Principles of Econometrics, 3rd Edition

Page 10: Chapter  16

16.7.6 Sample Selection

We will focus on this simple Tobit II model/bivariate sample selection model/Heckman model/Heckit/ Tobit modelwith stochastic threshold …

There’s more!!!

Wooldridge calls the model one with a probit selection equation.

Others call this model the generalized Tobit model

Others call it simply “the” selection model but there are may selection models

Slide16-10Principles of Econometrics, 3rd Edition

Page 11: Chapter  16

16.7.6 Sample Selection

Let y 2 denote the outcome of interest (say the wives’ wages or inour ∗initial example how many hours to work)

Tobit assumes that this outcome is observed if y 2 ∗ > 0

A more general model uses a different latent variable, y 1 , such that ∗y 2 is observed if ∗ y 1 ∗ > 0

y 1 determines whether to work or not BUT NOT how much to ∗work, y 2 does∗

Slide16-11Principles of Econometrics, 3rd Edition

Page 12: Chapter  16

16.7.6 Sample Selection

The Heckit technique famously takes into account that the decision to work may be correlated with the expected wage as in the mroz.dta example

We only observe the wages of women who do work, the non-working wives we also observe but we have no salary for them

If the reason why the decision to work is somehow related to some unobservable characteristic that also affects their wage, we are in trouble

Although the decision to work could be informed by many other things that have nothing to do with the wages ones does earn, wives work if the salary they are offered exceeds their reservation wage…so clearly we have an issue!

Slide16-12Principles of Econometrics, 3rd Edition

Page 13: Chapter  16

16.7.6 Sample Selection

Classic application was to labor supply, where y 1 is the unobserved desire or propensity to work, ∗ y2 is actual hours worked

See Mroz (1987)

Slide16-13Principles of Econometrics, 3rd Edition

Page 14: Chapter  16

16.7.6a The Econometric Model

The econometric model describing the situation is composed of two equations. The first, is the selection equation/participation equation that determines whether the variable of interest is observed.

Slide16-14Principles of Econometrics, 3rd Edition

(16.37)*1 2 1, ,i i iz w u i N

(16.38)

*1 0

0 otherwise

i

i

zz

Page 15: Chapter  16

16.7.6a The Econometric Model

The second equation is the linear model of interest. It is run only on the observations for which we have information on y

Slide16-15Principles of Econometrics, 3rd Edition

(16.39)

(16.40)

1 2 1, ,i i iy x e i n N n

(16.41)

*1 2| 0 1, ,i i i iE y z x i n

1 2

1 2

ii

i

w

w

Page 16: Chapter  16

16.7.6a The Econometric Model

The estimated “Inverse Mills Ratio” is

The estimating equation is

Slide16-16Principles of Econometrics, 3rd Edition

(16.42)

1 2

1 2

ii

i

w

w

1 2 1, ,i i i iy x v i n

This helps us cover for the missing informationthe omitted variablethat was biasing OLS

A test of whether or not the errors are correlated and sample selection correction is needed can be built as a Wald test of the estimated coefficient of the inverse Mills ratio

Page 17: Chapter  16

16.7.6a The Econometric Model

The estimated “Inverse Mills Ratio” is

The estimating equation is

Slide16-17Principles of Econometrics, 3rd Edition

(16.42)

1 2

1 2

ii

i

w

w

1 2 1, ,i i i iy x v i n

This helps us cover for the missing informationthe omitted variablethat was biasing OLS

Both the usual OLS standard errors and heteroskedasticity-robust standard errors reported from the regression if done manually are incorrect, use Heckit software!

Page 18: Chapter  16

16.7.6b Heckit Example: Wages of Married Women

Slide16-18Principles of Econometrics, 3rd Edition

(16.43) 2ln .4002 .1095 .0157 .1484

(t-stat) ( 2.10) (7.73) (3.90)

WAGE EDUC EXPER R

1 1.1923 .0206 .0838 .3139 1.3939

(t-stat) ( 2.93) (3.61) ( 2.54) ( 2.26)

P LFP AGE EDUC KIDS MTR

1.1923 .0206 .0838 .3139 1.3939

1.1923 .0206 .0838 .3139 1.3939

AGE EDUC KIDS MTRIMR

AGE EDUC KIDS MTR

Page 19: Chapter  16

16.7.6b Heckit Example: Wages of Married Women

The maximum likelihood estimated wage equation is

The standard errors based on the full information maximum likelihood procedure are smaller than those yielded by the two-step estimation method.

Slide16-19Principles of Econometrics, 3rd Edition

(16.44)

ln .8105 .0585 .0163 .8664

(t-stat) (1.64) (2.45) (4.08) ( 2.65)

(t-stat-adj) (1.33) (1.97) (3.88) ( 2.17)

WAGE EDUC EXPER IMR

ln .6686 .0658 .0118

(t-stat) (2.84) (3.96) (2.87)

WAGE EDUC EXPER

Page 20: Chapter  16

16.7.6 Sample Selection

We use two data different generation processes to explain the decision to work and the wage

But there is a subtle difference relative to the Cragg model

We here have a third (unobservable) element explaining the decision to work

If that element is also in the error of the main equation, we have a problem of sample selection

Slide16-20Principles of Econometrics, 3rd Edition

Page 21: Chapter  16

16.7.6 Sample Selection

Heckit with normal errors is theoretically identified without any restriction on the regressors

In principle, same regressors can appear in the equations for y 1 and ∗ y 2 ∗

In practice you want exclusion restrictions (something in the participation equation that is not in the outcome equation)

Otherwise you would be relying only on the nonlinearity of the inverse Mills ration for identification and the inverse Mills ratio term is actually approximately linear over a wide range of its argument, leading to multicollinearity issues

The problem is less severe the better a probit model can discriminate between participants and nonparticipants

Slide16-21Principles of Econometrics, 3rd Edition

Page 22: Chapter  16

16.7.6 Sample Selection

The problem is less severe the better a probit model can discriminate between participants and nonparticipants

You would prefer to use them but it can be very difficult to make defensible exclusion restrictions

Slide16-22Principles of Econometrics, 3rd Edition

Page 23: Chapter  16

16.7.6 Sample Selection

Extensions Plenty! And variations of the main model And different names for the same models!

Example: what is the selection process is not just a binary decision but an ordered one: oheckman (Chiburis, R. and M. Lokshin (2007))

Slide16-23Principles of Econometrics, 3rd Edition

Page 24: Chapter  16

16.7.6 Sample Selection

Extensions

Heckit handles linear regression models when there is a selection mechanism

However, if the outcome equation involves a dichotomous dependent variable too, we would have a probit selection equation and a probit outcome equation

That ‘double probit model’/‘bivariate probit model with selection’ can be estimated with heckprob in STATA

Slide16-24Principles of Econometrics, 3rd Edition

Page 25: Chapter  16

16.7.6 Sample Selection

Extensions

A simpler version is the bivariate probit (biprobit)

Slide16-25Principles of Econometrics, 3rd Edition

Page 26: Chapter  16

Keywords

Slide 16-26Principles of Econometrics, 3rd Edition

binary choice models censored data conditional logit count data models feasible generalized least squares Heckit identification problem independence of irrelevant

alternatives (IIA) index models individual and alternative specific

variables individual specific variables latent variables likelihood function limited dependent variables linear probability model

logistic random variable logit log-likelihood function marginal effect maximum likelihood estimation multinomial choice models multinomial logit odds ratio ordered choice models ordered probit ordinal variables Poisson random variable Poisson regression model probit selection bias tobit model truncated data

Page 27: Chapter  16

Further models

Survival analysis (time-to-event data analysis)

Multivariate probit (biprobit, triprobit, mvprobit)

Page 28: Chapter  16

References

Hoffmann, 2004 for all topicsLong, S. and J. Freese for all topicsCameron and Trivedi’s book for count

data

Agresti, A. (2001) Categorical Data Analysis (2nd ed). New York: Wiley.