Binary Choice #2 - Purdue Universityjltobias/674/lectures/binary... · 2013. 9. 16. · The short...

Binary Choice #2

Econ 674

Purdue University

Justin L. Tobias (Purdue) Binary Choice #2 1 / 23

Binary Choice

In the previous lecture, we discussed MLE estimation of popularbinary choice models: the probit and logit.

Consider the following Probit output, using data from Fair’s (1978)study of the probability of having an extramarital affair:


Parameter Interpretation in Binary Choice Models

Table 14.1: Point Estimates and StandardErrors from Probit Model Using Fair’s (1978) Data

Variable MLE Std. Error

CONS -.726 (.417)MALE .154 (.131)YS-MARRIED .029 (.013)KIDS .256 (.159)RELIGIOUS -.514 (.124)ED .005 (.026)HAPPY -.514 (.125)

What do these coefficients mean?



Once we depart from linear models, the coefficients themselves are nolonger directly interpretable - they do not represent marginal effects.

To see this, consider a simple model with one covariate, which is binary.Specifically,

What would you do in order to quantify the impact of x on the outcomethat y = 1?Most would consider the parameter:



∆ ≡ Pr(y = 1|x = 1)− Pr(y = 1|x = 0) = Φ(β1 + β2)− Φ(β1).

Note that the sign of the parameter β2 is indicative of the sign of ∆.

To estimate ∆ we would, of course, use:

How would we calculate a standard error for ∆?



The Delta method comes in handy here.

Specifically, denote the MLE estimate of the variance covariance matrix ofβ = [β1 β2]′ as V .

We can obtain an estimate of the variance of ∆ as:



OK, but what if x is continuous rather than binary? In this case, we wouldseem to seek:

Or, in the general case with many x ′s, we would obtain:



∂Pr(y = 1|x)

∂xj= φ(xβ)βj .

Again, note that the sign of βj is indicative of the sign of themarginal effect.

We would estimate this effect as

and employ a similar method to calculate a variance associated withour estimate.

But wait a minute, there is an additional complication:



There are infinitely many possibilities here, of course, but two conventionsdominate the literature. These are:


There seems to be a great deal of uncertainty regarding this (seeminglysimple) point in the literature. Will these two approaches give you thesame answer? Is there reason to prefer one over the other?

With respect to the equality of these parameters, consider the quote inyour text by Greene (page 775):

“For computing marginal effects, one can evaluate theexpressions st the sample means of the data or evaluate themarginal effects at every observation and use the sampleaverage of the individual marginal effects. The functions arecontinuous with continuous first derivatives so Theorem D.12[and others] apply; in large samples these will give the sameanswer. ”

What do you think?


ME’s for an average individual

Certainly the most common convention in the literature is to calculate andreport a marginal effect for an “average” individual. Letting x denote theaverage value of the x ’s in the sample, one could calculate



With respect to this final point, again, it is convention to make achoice and select particular values of the x ’s when they are discrete orbinary.

For example, you may calculate the marginal effect for “white femalesliving in the midwest who are married” rather than rounding thesevariables to mean values.



Another alternative, which I prefer in general, though used comparativelyrarely, is to focus on the average marginal effect in the population ratherthan a marginal effect for an average individual. The former could beestimated as:

Note, also, that under general conditions,

This approach has the notable advantage of being independent of thecovariates. It also seems to provide a nice summary of the impact of xj onPr(y = 1|x).



Let us now return to our reference data, and add marginal effects (happy,religious men with kids, when needed!):

Table 14.1: Coefficient and Marginal Effect PosteriorMeans and Standard Deviations from Probit Model

Using Fair’s (1978) Data

Coefficient Marginal EffectMean Std. Dev Mean Std. Dev

CONS -.726 (.417) —- —-MALE .154 (.131) .047 (.040)YS-MARRIED .029 (.013) .009 (.004)KIDS .256 (.159) .073 (.045)RELIGIOUS -.514 (.124) -.150 (.034)ED .005 (.026) .001 (.008)HAPPY -.514 (.125) -.167 (.042)

The marginal effects are interpretable and economically meaningful, whilethe coefficients themselves are not.


A note on nonlinear least squares

As an alternative to MLE, consider the following method for estimation inthe binary choice model:

Suppose

Then, it would seem reasonable to obtain β much like we did in aregression context:

This is called the nonlinear least-squares estimator of β.


A note on nonlinear least squares

Will this yield the same estimate as the probit MLE?

The short answer is no, as can be seen by the NLS FOC’s :

which differs from the probit FOC’s. Although the NLS estimator isconsistent in general, it is not efficient relative to the probit MLE.


The following slide provides results from a generated data experiment.

In this experiment, n = 500 observations are generated from a probitmodel:

Pr(yi = 1|xi ) = Φ(.5− .7xi ),

where the scalar xi variables are generated independently as iid standardnormal.

Data sets are obtained in this fashion 1,000 different times. For each dataset, we obtain the probit MLE as well as NLS estimates.

By comparing their sampling distributions, we can get a sense of theefficiency gains afforded by MLE.


0.2 0.4 0.6 0.80

20

40

60

80

100

β0: Probit

−1 −0.8 −0.60

20

40

60

80

100

β1: Probit

0.2 0.4 0.6 0.8 10

20

40

60

80

100

β0: NLS

−1.2 −1 −0.8 −0.6 −0.40

20

40

60

80

100

120

β1: NLS

The variance of the probit intercept estimates was about 92 percent of theanalogous NLS variance. Similarly, the variance of the probit slopes wasabout 81 percent of the NLS slope variance.


We close by noting that the vast majority of binary choice analysisconducted in applied work involve either the LPM, probit or logit.

Another link function worth considering (among many possibilities), andone that can give predictions different from both of these is the so-calledcomplementary log-log link function:

This link is sometimes (mistakenly) called the Weibull link. Actually, it isbased on the Type I extreme value (Gumbel) distribution.


−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

LogitProbitLog−Log


For reference, here are results from the logit and Complimentary Log-Logmodels for the Fair data:

Coefficient Marginal Effects

Probit Logit (Ratio) CLogLog Probit Logit CLogLog

-.726 -1.29 (1.78) -1.83 —- —- —-.154 .246 (1.60) .131 .047 .049 .026.029 .049 (1.69) .042 .009 .009 .008.256 .439 (1.72) .285 .073 .073 .054-.514 -.893 (1.74) -.777 -.150 -.151 -.148.005 .015 (2.88) .044 .001 .003 .008-.514 -.869 (1.70) -.743 -.167 -.167 -.164

What general points can you make?


Model comparison and summary of fit

As a quick and simple way to compare the performances acrossmodels, one can simply look to the maximized log-likelihoods of eachspecification, since the models contain an equal number ofparameters.

Sometimes, ad hoc criterion are used to assess fit. For example, onemight adopt the rule of thumb to predict yi = 1 wheneveryi = F (xi β) > .5 (and conversely when y = 0). The quality of amodel’s fit is then determined by the fraction of observations correctlypredicted. Such scoring rules are reasonably common, thoughsomewhat arbitrary.


Model comparison and summary of fit

Finally, McFadden (1974 J Pub E) suggests the Pseudo R-squaredmeasure:

noting that the model can predict y for Pr(y = 1) with an intercept onlyand in general will do better. (Note as well the analogy with R2 from alinear regression).Furthermore, in the limiting case of perfect prediction, we have Fi = 1whenever yi = 1 and similarly, Fi = 0 whenever yi = 0. In this extremecase, the Pseudo-R2 = 1.


Binary Choice #2 - Purdue Universityjltobias/674/lectures/binary... · 2013. 9. 16. · The short...

Documents

Transcript of Binary Choice #2 - Purdue Universityjltobias/674/lectures/binary... · 2013. 9. 16. · The short...