MicroEconometrics Lecture10
-
Upload
carolina-correa-caro -
Category
Documents
-
view
238 -
download
1
Transcript of MicroEconometrics Lecture10
-
8/13/2019 MicroEconometrics Lecture10
1/27
Economics 440.618 Microeconometrics
Michael T. Sandfort
Department of Applied Economics
The Johns Hopkins University
November 19, 2013
-
8/13/2019 MicroEconometrics Lecture10
2/27
Limited Dependent Variable Models
The models we have looked at so far are appropriate ify is a
continuous quantitative variable, e.g., demand, price, output,wage rate, etc.
But there are many interesting economic decisions which aretough to characterize continously:
For a worker: accept or reject a job offer?
For a firm: enter (if out) or exit (if in) a market? For a woman: whether to have a child? If so, how many? For a consumer: which brand of peanut butter to buy?
Today, we are going to start discussing the econometrics ofthis kind of data starting with a very simple model: the binary
choice model. The binary choice model is a good entry point to talking
about more complicated limited dependent variable models,which is where we will end the course.
Economics 440.618 Microeconometrics 2
-
8/13/2019 MicroEconometrics Lecture10
3/27
Binary Response Model
We say that a dependent variable ywe want to model isbinary if it takes on only two values. We typically recode these
two values as 1 (yes, true, success, accepted,survived, etc.) and 0 (no, false, failure, rejected,died, etc.).
Whenever the variable we want to model is binary, its naturalto think in terms of probabilities.
What is the probability that an individual with characteristics xowns a home?
If the persons characteristics were x rather than x, howwould that affect the probability that they own a home?
Data for such dependent variables looks just like a dummy
variable that one might use as an explanatory variable in aregression.
But the implications of having it on the left hand side of ourstructural equation rather than the right hand side are
significant.Economics 440.618 Microeconometrics 3
-
8/13/2019 MicroEconometrics Lecture10
4/27
Binary Response Model
Assuming we have a random sample, the sample mean of ourbinary variable is an unbiased estimate of the unconditional
probability of success (y= 1). That is,
E
1
n
i
yi
=
1
nnE(y) = E(y)
= 1 Pr(y= 1) + 0 Pr(y= 0)= Pr(y= 1)
Estimating the unconditional probability of success is fine, butit wont allow us to explore many interesting policy questions.
What is the overall rate of home ownership vs. Whatwould be the effect on the home ownership rate of rescindingthe home mortgage interest tax deduction?
Can we use OLS to address a problem like this?
Economics 440.618 Microeconometrics 4
-
8/13/2019 MicroEconometrics Lecture10
5/27
OLS Estimation: The Linear Probability Model (LPM)
A naive way to approach this problem is to simply apply theclassical regression model, so
y= x+u
with OLS.1 and OLS.2.
Using the same logic as on the previous slide, we know thatPr(y= 1|x) =E(y|x) = x. Since probabilities must sum to one, it is also true that
Pr(y= 0|x) = 1 Pr(y= 1|x) = 1 x. This is abinary response model where the probability of
success is a linear function of x hence the name.
In the linear probability model, the incremental change inprobability due to a discrete change xj in xj isPr(y= 1|x) =jxj.
Economics 440.618 Microeconometrics 5
-
8/13/2019 MicroEconometrics Lecture10
6/27
LPM: An ExampleConsider again the data on womens wages weve looked at before.Rather than dropping the zero-hours observations, well try to
directly model the labor force participation decision.Our dependent variable will be y= inlf which takes the value 1 ifthe woman was in the labor force and 0 otherwise. We fit a modelof labor force participation:> print(coeftest(res))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.58551922 0.15417800 3.7977 0.0001579 ***
nwifeinc -0.00340517 0.00144849 -2.3508 0.0189908 *
educ 0.03799530 0.00737602 5.1512 3.317e-07 ***
exper 0.03949239 0.00567267 6.9619 7.376e-12 ***
expersq -0.00059631 0.00018479 -3.2270 0.0013059 **age -0.01609081 0.00248468 -6.4760 1.709e-10 ***
kidslt6 -0.26181047 0.03350579 -7.8139 1.889e-14 ***
kidsge6 0.01301223 0.01319596 0.9861 0.3244154
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
This looks OK, but consider the following...Economics 440.618 Microeconometrics 6
-
8/13/2019 MicroEconometrics Lecture10
7/27
LPM: An ExampleOn the left is what the data and fitted regression line look like withjust inlf as a function ofexper. On the right is the marginal
effect of an increase in experience on the probability of labor forceparticipation:
10 0 10 20 30 40 50
1.
0
0.5
0.0
0.
5
1.0
1.
5
2.
0
exper
inlf
10 0 10 20 30 40 50
0.0
0.5
1.0
1.5
exper
r
Note that the predicted probability Pr(y= 1|exper)>1 forexper>10 and that Prexceeds one (100%) for exper>30.
Economics 440.618 Microeconometrics 7
-
8/13/2019 MicroEconometrics Lecture10
8/27
Concerns About LPMSome of the problems are fairly evident, but worth highlighting (LHS figure) Some plausible combinations of independent
variables give fitted/predicted probabilities less than zero orgreater than one. Since a probability must lie between zeroand one, this can be embarrassing.
(RHS figure) It doesnt really make sense to say that aprobability measure is linearly related to a continuously
varying independent variable over a large range. Changes inexperience exper>30 are within the range of the data, andlead to changes in probability over 100%.
Heteroskedasticity is baked into the model in the sense that
V(u|x) = Pr(y= 1|x)[1 x]2
+Pr(y= 0|x)[0 x]2
= x[1 x]2 + (1 x)[x]2= x[1 x]
which is clearly a function of x. Thus, use of a robust
variance-covariance matrix is compulsory.Economics 440.618 Microeconometrics 8
-
8/13/2019 MicroEconometrics Lecture10
9/27
Concerns About LPMFinally, the residuals from the LPM are clearly non-normal, as seenin the figure. A kernel density plot (a continuous approximation of
a histogram) of the data is shown in red. The scaled normaldensity is shown ingreenfor reference.
Histogram and Kernel Density for LPM Residuals
Residuals
Density
1.0 0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Economics 440.618 Microeconometrics 9
-
8/13/2019 MicroEconometrics Lecture10
10/27
An Alternative Model
To address these problems in the LPM, consider an alternativebinary response model of the form
Pr(y= 1|x) =G(1+ 2x2+ + kxk) =G(x)
where G(w) is a function taking values strictly between zeroand one (0
-
8/13/2019 MicroEconometrics Lecture10
11/27
The Logit Model
Two CDFs G(w) are used inmost applications. The first isthelogitCDF, which has the
form
G(x) = exp(x)
1 + exp(x)
= (x)
3 2 1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
The Logit CDF (w)
w
(w)
Economics 440.618 Microeconometrics 11
-
8/13/2019 MicroEconometrics Lecture10
12/27
The Probit Model
The second is thestandardnormalCDF, which has the form
G(x) = x
(w)dw
= (x)
where
(w) =
1
2 expw2
2
3 2 1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
The Probit (Normal) CDF (w)
w
(w)
Economics 440.618 Microeconometrics 12
-
8/13/2019 MicroEconometrics Lecture10
13/27
How to estimate?
Both of these functions increase rapidly near the origin and
increase much less rapidly toward the extremes. That is, the partial/marginal effect of an increase in x
changes, depending on the level of x.
This is a desirable feature, because constant marginal effects
was one of the objections to the LPM. Nonetheless, it raises a new challenge, because this is no
longer a model that we can estimate by OLS.
The approach usually taken to estimation of index functionmodels like the logit and probit is maximum likelihood (ML).
ML estimation is a technique with very broad applications, sowell spend the rest of this class talking about the MLestimator and then apply it specifically to the logit and probit.
Economics 440.618 Microeconometrics 13
-
8/13/2019 MicroEconometrics Lecture10
14/27
Maximum Likelihood (ML) Estimation
The principle underlying ML estimation is fairly simple toarticulate. Consider the figure below where the histogram from a
sample y= (y1, . . . , yn) from a univariate random variable y isshown. The sample looks like it could have come from a normaldistribution, so the same figure shows several normal densities(, 2), with different and 2. From which density does thedata most likely come?
Histogram of y
y
D
ensity
4 2 0 2 4
0.0
0.2
0
.4
0.6
0.8
1.0
Economics 440.618 Microeconometrics 14
-
8/13/2019 MicroEconometrics Lecture10
15/27
Maximum Likelihood (ML) Estimation
ML estimation is just a way of formalizing this procedure, so
you can use the technique in situations with many moreparameters situations where your ability to visualize may failyou.
Formally, suppose a population in which the random variabley is distributed according to some distribution with density
f(y,), where is a vector of parameters. Also suppose that we have a random sample y= (y1, . . . , yn)
drawn from that population but that we dont know .
Evidently, the data is more likely (think about the figure
again) to have been drawn from a density with one set ofparameters (say, ) rather than another (say, ).
Our objective is to use the data y to develop our bestestimate of.
Economics 440.618 Microeconometrics 15
-
8/13/2019 MicroEconometrics Lecture10
16/27
Maximum Likelihood (ML) Estimation If the draws in y are taken independently from the population
with true parameter f(y,), then the probability (or
likelihood) of observing the sample isL(|y1, y2, . . . , yn) =f(y1,)f(y2,) f(y1, )
since the probability of two or more independent events is justthe product of their probabilities.
The function L(, y) is known as thelikelihood function. Since the maximum of any function g(z) is the same as the
maximum ofh(g(z)) ifh is a strictly increasing function, weoften work with thelog-likelihood function
ln L(, y) =
ni=1
ln f(yi, )
rather than the log likelihood function itself. The value which maximizes L(, y) (or lnL(, y)) is known
as themaximum likelihood estimateof.Economics 440.618 Microeconometrics 16
( )
-
8/13/2019 MicroEconometrics Lecture10
17/27
ML Estimation: Example 1 (Exponential)
As a first example, consider using ML to estimate the singleparameter from data y known to come from an exponentialdistribution. The density of the exponential distribution is
f(y, ) =1
e
y
So the likelihood function L(, y) is
L(, y) =
1
e
y1
1
e
y2
1
e
yn
And the log-likelihood function is
ln L(, y) = n ln() 1
ni=1
yi
Economics 440.618 Microeconometrics 17
( )
-
8/13/2019 MicroEconometrics Lecture10
18/27
ML Estimation: Example 1 (Exponential) Because the log-likelihood function is so simple, we can
actually solve it algebraically. Usually, thats not possible.
Taking the derivative of the log-likelihood function withrespect to gives
dln L(, y)
d = n
+
iyi
2
Setting the derivative equal to zero solves for an interior
maximizer of the log-likelihood function.
n
+
iyi
2= 0
or= 1n iyi.
In other words, the ML estimate of for an exponentialdistribution is just = y, the sample mean.
This makes sense because (if you look up the exponentialdistribution), is the population mean of an exponentialpopulation.
Economics 440.618 Microeconometrics 18
ML E i i E l 1 (E i l)
-
8/13/2019 MicroEconometrics Lecture10
19/27
ML Estimation: Example 1 (Exponential)
Generally, we cant solve algebraically for the ML estimator, sowe have to use a computer to calculate .
To do this for the exponential example, I first create afunction which returns the value of the log-likelihood functionln L(, y). This function has two arguments, but in ML thedata is taken as given, so the second argument doesntchange.> LL.exp = function(theta,y) {
+ n = length(y)+ LL = -n * log(theta) - (1/theta) * sum(y)+ return(LL)
+ }
If the number of parameters in is large, I may not be able to
plot the log-likelihood function very easily but I can solve forthe ML estimator even if I cant visualize the log-likelihoodfunction.
In this example I only have a single parameter (=), so aplot is a useful tool.
Economics 440.618 Microeconometrics 19
ML E i i E l 1 (E i l)
-
8/13/2019 MicroEconometrics Lecture10
20/27
ML Estimation: Example 1 (Exponential)
The sample (n= 1000) was drawn from an exp(.5) distribution, sothe sample mean will be very near 1
2. The log-likelihood evidently
has a maximum around 0.5, but to find out what it is exactly, wewill need to find the max numerically.
0.4 0.6 0.8 1.0
600
500
400
300
ln
L(,y
)
Economics 440.618 Microeconometrics 20
ML E ti ti E l 1 (E ti l)
-
8/13/2019 MicroEconometrics Lecture10
21/27
ML Estimation: Example 1 (Exponential)
The optim function can be used to perform numericaloptimization. Heres an example of using it to solve the lastproblem:> sol = optim(1,fn=LL.exp,gr=NULL,y.data,method="BFGS",control=list(fnscale=-1))
> sol$par
[1] 0.4983282
Heres what the arguments mean:
gr A function providing the gradient is often used to speed upcalculation. The argument gr references that function if it exists(otherwise NULL).
y.data This is a reference to our data set. All unnamed arguments after grare passed through to the function to be optimized: LL.exp.
method There are several options. This is a safe one.
control This is a list of control parameters. By setting fnscale to -1, weare saying we want to maximize rather than minimize (the default).
Economics 440.618 Microeconometrics 21
ML E ti ti E l 1 (E ti l)
-
8/13/2019 MicroEconometrics Lecture10
22/27
ML Estimation: Example 1 (Exponential)Note that our data doesnt have to be drawn from an exponentialto fit it with an exponential density. We could do the same thingwith data from a U(.25, 1).
0.4 0.6 0.8 1.0
1000
800
600
ln
L(,
y)
> sol = optim(1,fn=LL.exp,gr=NULL,y.data,method="BFGS",control=list(fnscale=-1))
> sol$par
[1] 0.6135531
Economics 440.618 Microeconometrics 22
ML Esti atio E a le 2 (No al)
-
8/13/2019 MicroEconometrics Lecture10
23/27
ML Estimation: Example 2 (Normal)Similarly, we can estimate parameters of a normal via ML. Wedont need to do much algebra to know that, since the populationparameters are = (, 2), the ML estimates of the parametersshould end up looking like the sample mean and variance. Letscheck.
The likelihood function is
L(1, 2, y) =
ni=1
122 exp
(yi 1)222 or
L(1, 2, y) = 1
22n
exp i(yi 1)2
2
2
The log-likelihood function is
ln L(1, 2, y) = n2
ln(22)
i(yi 1)222
Economics 440.618 Microeconometrics 23
ML Estimation: Example 2 (Normal)
-
8/13/2019 MicroEconometrics Lecture10
24/27
ML Estimation: Example 2 (Normal)
The partial derivative of the log-likelihood with respect to 1 is
ln L1
= 12
ni=1
(yi 1)
The partial with respect to 2 is
ln L
2= n
22+
1
222
ni=1
(yi 1)2
Its left to you as an exercise to set these two expressions
equal to zero and show that they lead to
1 = y and 2 = 1
n
ni=1
(yi y)2
Economics 440.618 Microeconometrics 24
ML Estimation: Example 2 (Normal)
-
8/13/2019 MicroEconometrics Lecture10
25/27
ML Estimation: Example 2 (Normal)
Now lets solve this as we usually would, numerically. We startwith a data set consisting of 1000 draws from a N(0, 1). Below arethe log-likelihood function and a plot of contours of thelog-likelihood function. The plot clearly shows a maximum= (1,2) somewhere in the vicinity of (0, 1) so far so good.> LL.norm = function(theta,y) {
+ n = length(y)+ LL = -(n/2) * log(2*pi) - (n/2)*log(theta[2]) -
+ (1/(2*theta[2])) * sum( (y-theta[1])^2 )+ return(LL)
+ }
Economics 440.618 Microeconometrics 25
ML Estimation: Example 2 (Normal)
-
8/13/2019 MicroEconometrics Lecture10
26/27
ML Estimation: Example 2 (Normal)
t1
t2
14561452
1450
1448 14481446
14441442
1440
1438
1438
1436
1436
1434
1434
1432
1432
1430
1428
1426
1424
0.10 0.05 0.00 0.05 0.10
0.
8
0.
9
1.
0
1.
1
1.
2
Economics 440.618 Microeconometrics 26
ML Estimation: Example 2 (Normal)
-
8/13/2019 MicroEconometrics Lecture10
27/27
ML Estimation: Example 2 (Normal)
> sol = optim(c(.5,.75),fn=LL.norm,gr=NULL,y.data,+ method="BFGS",control=list(fnscale=-1))
> sol$par
[1] 0.0295824 1.0092104
Economics 440.618 Microeconometrics 27