Binary Choice #2 - Purdue Universityjltobias/674/lectures/binary... · 2013. 9. 16. · The short...
Transcript of Binary Choice #2 - Purdue Universityjltobias/674/lectures/binary... · 2013. 9. 16. · The short...
Binary Choice #2
Econ 674
Purdue University
Justin L. Tobias (Purdue) Binary Choice #2 1 / 23
Binary Choice
In the previous lecture, we discussed MLE estimation of popularbinary choice models: the probit and logit.
Consider the following Probit output, using data from Fair’s (1978)study of the probability of having an extramarital affair:
Justin L. Tobias (Purdue) Binary Choice #2 2 / 23
Parameter Interpretation in Binary Choice Models
Table 14.1: Point Estimates and StandardErrors from Probit Model Using Fair’s (1978) Data
Variable MLE Std. Error
CONS -.726 (.417)MALE .154 (.131)YS-MARRIED .029 (.013)KIDS .256 (.159)RELIGIOUS -.514 (.124)ED .005 (.026)HAPPY -.514 (.125)
What do these coefficients mean?
Justin L. Tobias (Purdue) Binary Choice #2 3 / 23
Parameter Interpretation in Binary Choice Models
Once we depart from linear models, the coefficients themselves are nolonger directly interpretable - they do not represent marginal effects.
To see this, consider a simple model with one covariate, which is binary.Specifically,
What would you do in order to quantify the impact of x on the outcomethat y = 1?Most would consider the parameter:
Justin L. Tobias (Purdue) Binary Choice #2 4 / 23
Parameter Interpretation in Binary Choice Models
∆ ≡ Pr(y = 1|x = 1)− Pr(y = 1|x = 0) = Φ(β1 + β2)− Φ(β1).
Note that the sign of the parameter β2 is indicative of the sign of ∆.
To estimate ∆ we would, of course, use:
How would we calculate a standard error for ∆?
Justin L. Tobias (Purdue) Binary Choice #2 5 / 23
Parameter Interpretation in Binary Choice Models
The Delta method comes in handy here.
Specifically, denote the MLE estimate of the variance covariance matrix ofβ = [β1 β2]′ as V .
We can obtain an estimate of the variance of ∆ as:
Justin L. Tobias (Purdue) Binary Choice #2 6 / 23
Parameter Interpretation in Binary Choice Models
OK, but what if x is continuous rather than binary? In this case, we wouldseem to seek:
Or, in the general case with many x ′s, we would obtain:
Justin L. Tobias (Purdue) Binary Choice #2 7 / 23
Parameter Interpretation in Binary Choice Models
∂Pr(y = 1|x)
∂xj= φ(xβ)βj .
Again, note that the sign of βj is indicative of the sign of themarginal effect.
We would estimate this effect as
and employ a similar method to calculate a variance associated withour estimate.
But wait a minute, there is an additional complication:
Justin L. Tobias (Purdue) Binary Choice #2 8 / 23
Parameter Interpretation in Binary Choice Models
There are infinitely many possibilities here, of course, but two conventionsdominate the literature. These are:
Justin L. Tobias (Purdue) Binary Choice #2 9 / 23
There seems to be a great deal of uncertainty regarding this (seeminglysimple) point in the literature. Will these two approaches give you thesame answer? Is there reason to prefer one over the other?
With respect to the equality of these parameters, consider the quote inyour text by Greene (page 775):
“For computing marginal effects, one can evaluate theexpressions st the sample means of the data or evaluate themarginal effects at every observation and use the sampleaverage of the individual marginal effects. The functions arecontinuous with continuous first derivatives so Theorem D.12[and others] apply; in large samples these will give the sameanswer. ”
What do you think?
Justin L. Tobias (Purdue) Binary Choice #2 10 / 23
ME’s for an average individual
Certainly the most common convention in the literature is to calculate andreport a marginal effect for an “average” individual. Letting x denote theaverage value of the x ’s in the sample, one could calculate
Justin L. Tobias (Purdue) Binary Choice #2 11 / 23
Parameter Interpretation in Binary Choice Models
With respect to this final point, again, it is convention to make achoice and select particular values of the x ’s when they are discrete orbinary.
For example, you may calculate the marginal effect for “white femalesliving in the midwest who are married” rather than rounding thesevariables to mean values.
Justin L. Tobias (Purdue) Binary Choice #2 12 / 23
Parameter Interpretation in Binary Choice Models
Another alternative, which I prefer in general, though used comparativelyrarely, is to focus on the average marginal effect in the population ratherthan a marginal effect for an average individual. The former could beestimated as:
Note, also, that under general conditions,
This approach has the notable advantage of being independent of thecovariates. It also seems to provide a nice summary of the impact of xj onPr(y = 1|x).
Justin L. Tobias (Purdue) Binary Choice #2 13 / 23
Parameter Interpretation in Binary Choice Models
Let us now return to our reference data, and add marginal effects (happy,religious men with kids, when needed!):
Table 14.1: Coefficient and Marginal Effect PosteriorMeans and Standard Deviations from Probit Model
Using Fair’s (1978) Data
Coefficient Marginal EffectMean Std. Dev Mean Std. Dev
CONS -.726 (.417) —- —-MALE .154 (.131) .047 (.040)YS-MARRIED .029 (.013) .009 (.004)KIDS .256 (.159) .073 (.045)RELIGIOUS -.514 (.124) -.150 (.034)ED .005 (.026) .001 (.008)HAPPY -.514 (.125) -.167 (.042)
The marginal effects are interpretable and economically meaningful, whilethe coefficients themselves are not.
Justin L. Tobias (Purdue) Binary Choice #2 14 / 23
A note on nonlinear least squares
As an alternative to MLE, consider the following method for estimation inthe binary choice model:
Suppose
Then, it would seem reasonable to obtain β much like we did in aregression context:
This is called the nonlinear least-squares estimator of β.
Justin L. Tobias (Purdue) Binary Choice #2 15 / 23
A note on nonlinear least squares
Will this yield the same estimate as the probit MLE?
The short answer is no, as can be seen by the NLS FOC’s :
which differs from the probit FOC’s. Although the NLS estimator isconsistent in general, it is not efficient relative to the probit MLE.
Justin L. Tobias (Purdue) Binary Choice #2 16 / 23
The following slide provides results from a generated data experiment.
In this experiment, n = 500 observations are generated from a probitmodel:
Pr(yi = 1|xi ) = Φ(.5− .7xi ),
where the scalar xi variables are generated independently as iid standardnormal.
Data sets are obtained in this fashion 1,000 different times. For each dataset, we obtain the probit MLE as well as NLS estimates.
By comparing their sampling distributions, we can get a sense of theefficiency gains afforded by MLE.
Justin L. Tobias (Purdue) Binary Choice #2 17 / 23
0.2 0.4 0.6 0.80
20
40
60
80
100
β0: Probit
−1 −0.8 −0.60
20
40
60
80
100
β1: Probit
0.2 0.4 0.6 0.8 10
20
40
60
80
100
β0: NLS
−1.2 −1 −0.8 −0.6 −0.40
20
40
60
80
100
120
β1: NLS
The variance of the probit intercept estimates was about 92 percent of theanalogous NLS variance. Similarly, the variance of the probit slopes wasabout 81 percent of the NLS slope variance.
Justin L. Tobias (Purdue) Binary Choice #2 18 / 23
We close by noting that the vast majority of binary choice analysisconducted in applied work involve either the LPM, probit or logit.
Another link function worth considering (among many possibilities), andone that can give predictions different from both of these is the so-calledcomplementary log-log link function:
This link is sometimes (mistakenly) called the Weibull link. Actually, it isbased on the Type I extreme value (Gumbel) distribution.
Justin L. Tobias (Purdue) Binary Choice #2 19 / 23
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LogitProbitLog−Log
Justin L. Tobias (Purdue) Binary Choice #2 20 / 23
For reference, here are results from the logit and Complimentary Log-Logmodels for the Fair data:
Coefficient Marginal Effects
Probit Logit (Ratio) CLogLog Probit Logit CLogLog
-.726 -1.29 (1.78) -1.83 —- —- —-.154 .246 (1.60) .131 .047 .049 .026.029 .049 (1.69) .042 .009 .009 .008.256 .439 (1.72) .285 .073 .073 .054-.514 -.893 (1.74) -.777 -.150 -.151 -.148.005 .015 (2.88) .044 .001 .003 .008-.514 -.869 (1.70) -.743 -.167 -.167 -.164
What general points can you make?
Justin L. Tobias (Purdue) Binary Choice #2 21 / 23
Model comparison and summary of fit
As a quick and simple way to compare the performances acrossmodels, one can simply look to the maximized log-likelihoods of eachspecification, since the models contain an equal number ofparameters.
Sometimes, ad hoc criterion are used to assess fit. For example, onemight adopt the rule of thumb to predict yi = 1 wheneveryi = F (xi β) > .5 (and conversely when y = 0). The quality of amodel’s fit is then determined by the fraction of observations correctlypredicted. Such scoring rules are reasonably common, thoughsomewhat arbitrary.
Justin L. Tobias (Purdue) Binary Choice #2 22 / 23
Model comparison and summary of fit
Finally, McFadden (1974 J Pub E) suggests the Pseudo R-squaredmeasure:
noting that the model can predict y for Pr(y = 1) with an intercept onlyand in general will do better. (Note as well the analogy with R2 from alinear regression).Furthermore, in the limiting case of perfect prediction, we have Fi = 1whenever yi = 1 and similarly, Fi = 0 whenever yi = 0. In this extremecase, the Pseudo-R2 = 1.
Justin L. Tobias (Purdue) Binary Choice #2 23 / 23