MicroEconometrics Lecture10

8/13/2019 MicroEconometrics Lecture10

1/27

Economics 440.618 Microeconometrics

Michael T. Sandfort

Department of Applied Economics

The Johns Hopkins University

November 19, 2013


2/27

Limited Dependent Variable Models

The models we have looked at so far are appropriate ify is a

continuous quantitative variable, e.g., demand, price, output,wage rate, etc.

But there are many interesting economic decisions which aretough to characterize continously:

For a worker: accept or reject a job offer?

For a firm: enter (if out) or exit (if in) a market? For a woman: whether to have a child? If so, how many? For a consumer: which brand of peanut butter to buy?

Today, we are going to start discussing the econometrics ofthis kind of data starting with a very simple model: the binary

choice model. The binary choice model is a good entry point to talking

about more complicated limited dependent variable models,which is where we will end the course.

Economics 440.618 Microeconometrics 2


3/27

Binary Response Model

We say that a dependent variable ywe want to model isbinary if it takes on only two values. We typically recode these

two values as 1 (yes, true, success, accepted,survived, etc.) and 0 (no, false, failure, rejected,died, etc.).

Whenever the variable we want to model is binary, its naturalto think in terms of probabilities.

What is the probability that an individual with characteristics xowns a home?

If the persons characteristics were x rather than x, howwould that affect the probability that they own a home?

Data for such dependent variables looks just like a dummy

variable that one might use as an explanatory variable in aregression.

But the implications of having it on the left hand side of ourstructural equation rather than the right hand side are

significant.Economics 440.618 Microeconometrics 3


4/27

Binary Response Model

Assuming we have a random sample, the sample mean of ourbinary variable is an unbiased estimate of the unconditional

probability of success (y= 1). That is,

E

1

n

i

yi

=

1

nnE(y) = E(y)

= 1 Pr(y= 1) + 0 Pr(y= 0)= Pr(y= 1)

Estimating the unconditional probability of success is fine, butit wont allow us to explore many interesting policy questions.

What is the overall rate of home ownership vs. Whatwould be the effect on the home ownership rate of rescindingthe home mortgage interest tax deduction?

Can we use OLS to address a problem like this?



5/27

OLS Estimation: The Linear Probability Model (LPM)

A naive way to approach this problem is to simply apply theclassical regression model, so

y= x+u

with OLS.1 and OLS.2.

Using the same logic as on the previous slide, we know thatPr(y= 1|x) =E(y|x) = x. Since probabilities must sum to one, it is also true that

Pr(y= 0|x) = 1 Pr(y= 1|x) = 1 x. This is abinary response model where the probability of

success is a linear function of x hence the name.

In the linear probability model, the incremental change inprobability due to a discrete change xj in xj isPr(y= 1|x) =jxj.



6/27

LPM: An ExampleConsider again the data on womens wages weve looked at before.Rather than dropping the zero-hours observations, well try to

directly model the labor force participation decision.Our dependent variable will be y= inlf which takes the value 1 ifthe woman was in the labor force and 0 otherwise. We fit a modelof labor force participation:> print(coeftest(res))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.58551922 0.15417800 3.7977 0.0001579 ***

nwifeinc -0.00340517 0.00144849 -2.3508 0.0189908 *

educ 0.03799530 0.00737602 5.1512 3.317e-07 ***

exper 0.03949239 0.00567267 6.9619 7.376e-12 ***

expersq -0.00059631 0.00018479 -3.2270 0.0013059 **age -0.01609081 0.00248468 -6.4760 1.709e-10 ***

kidslt6 -0.26181047 0.03350579 -7.8139 1.889e-14 ***

kidsge6 0.01301223 0.01319596 0.9861 0.3244154

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

This looks OK, but consider the following...Economics 440.618 Microeconometrics 6


7/27

LPM: An ExampleOn the left is what the data and fitted regression line look like withjust inlf as a function ofexper. On the right is the marginal

effect of an increase in experience on the probability of labor forceparticipation:

10 0 10 20 30 40 50

1.

0

0.5

0.0

0.

5

1.0

1.

5

2.

0

exper

inlf

10 0 10 20 30 40 50

0.0

0.5

1.0

1.5

exper

r

Note that the predicted probability Pr(y= 1|exper)>1 forexper>10 and that Prexceeds one (100%) for exper>30.



8/27

Concerns About LPMSome of the problems are fairly evident, but worth highlighting (LHS figure) Some plausible combinations of independent

variables give fitted/predicted probabilities less than zero orgreater than one. Since a probability must lie between zeroand one, this can be embarrassing.

(RHS figure) It doesnt really make sense to say that aprobability measure is linearly related to a continuously

varying independent variable over a large range. Changes inexperience exper>30 are within the range of the data, andlead to changes in probability over 100%.

Heteroskedasticity is baked into the model in the sense that

V(u|x) = Pr(y= 1|x)[1 x]2

+Pr(y= 0|x)[0 x]2

= x[1 x]2 + (1 x)[x]2= x[1 x]

which is clearly a function of x. Thus, use of a robust

variance-covariance matrix is compulsory.Economics 440.618 Microeconometrics 8


9/27

Concerns About LPMFinally, the residuals from the LPM are clearly non-normal, as seenin the figure. A kernel density plot (a continuous approximation of

a histogram) of the data is shown in red. The scaled normaldensity is shown ingreenfor reference.

Histogram and Kernel Density for LPM Residuals

Residuals

Density

1.0 0.5 0.0 0.5 1.0

0.0

0.2

0.4

0.6

0.8

1.0



10/27

An Alternative Model

To address these problems in the LPM, consider an alternativebinary response model of the form

Pr(y= 1|x) =G(1+ 2x2+ + kxk) =G(x)

where G(w) is a function taking values strictly between zeroand one (0


11/27

The Logit Model

Two CDFs G(w) are used inmost applications. The first isthelogitCDF, which has the

form

G(x) = exp(x)

1 + exp(x)

= (x)

3 2 1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

The Logit CDF (w)

w

(w)



12/27

The Probit Model

The second is thestandardnormalCDF, which has the form

G(x) = x

(w)dw

= (x)

where

(w) =

1

2 expw2

2

3 2 1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

The Probit (Normal) CDF (w)

w

(w)



13/27

How to estimate?

Both of these functions increase rapidly near the origin and

increase much less rapidly toward the extremes. That is, the partial/marginal effect of an increase in x

changes, depending on the level of x.

This is a desirable feature, because constant marginal effects

was one of the objections to the LPM. Nonetheless, it raises a new challenge, because this is no

longer a model that we can estimate by OLS.

The approach usually taken to estimation of index functionmodels like the logit and probit is maximum likelihood (ML).

ML estimation is a technique with very broad applications, sowell spend the rest of this class talking about the MLestimator and then apply it specifically to the logit and probit.



14/27

Maximum Likelihood (ML) Estimation

The principle underlying ML estimation is fairly simple toarticulate. Consider the figure below where the histogram from a

sample y= (y1, . . . , yn) from a univariate random variable y isshown. The sample looks like it could have come from a normaldistribution, so the same figure shows several normal densities(, 2), with different and 2. From which density does thedata most likely come?

Histogram of y

y

D

ensity

4 2 0 2 4

0.0

0.2

0

.4

0.6

0.8

1.0



15/27

Maximum Likelihood (ML) Estimation

ML estimation is just a way of formalizing this procedure, so

you can use the technique in situations with many moreparameters situations where your ability to visualize may failyou.

Formally, suppose a population in which the random variabley is distributed according to some distribution with density

f(y,), where is a vector of parameters. Also suppose that we have a random sample y= (y1, . . . , yn)

drawn from that population but that we dont know .

Evidently, the data is more likely (think about the figure

again) to have been drawn from a density with one set ofparameters (say, ) rather than another (say, ).

Our objective is to use the data y to develop our bestestimate of.



16/27

Maximum Likelihood (ML) Estimation If the draws in y are taken independently from the population

with true parameter f(y,), then the probability (or

likelihood) of observing the sample isL(|y1, y2, . . . , yn) =f(y1,)f(y2,) f(y1, )

since the probability of two or more independent events is justthe product of their probabilities.

The function L(, y) is known as thelikelihood function. Since the maximum of any function g(z) is the same as the

maximum ofh(g(z)) ifh is a strictly increasing function, weoften work with thelog-likelihood function

ln L(, y) =

ni=1

ln f(yi, )

rather than the log likelihood function itself. The value which maximizes L(, y) (or lnL(, y)) is known

as themaximum likelihood estimateof.Economics 440.618 Microeconometrics 16

( )


17/27

ML Estimation: Example 1 (Exponential)

As a first example, consider using ML to estimate the singleparameter from data y known to come from an exponentialdistribution. The density of the exponential distribution is

f(y, ) =1

e

y

So the likelihood function L(, y) is

L(, y) =

1

e

y1

1

e

y2

1

e

yn

And the log-likelihood function is

ln L(, y) = n ln() 1

ni=1

yi


( )


18/27

ML Estimation: Example 1 (Exponential) Because the log-likelihood function is so simple, we can

actually solve it algebraically. Usually, thats not possible.

Taking the derivative of the log-likelihood function withrespect to gives

dln L(, y)

d = n

+

iyi

2

Setting the derivative equal to zero solves for an interior

maximizer of the log-likelihood function.

n

+

iyi

2= 0

or= 1n iyi.

In other words, the ML estimate of for an exponentialdistribution is just = y, the sample mean.

This makes sense because (if you look up the exponentialdistribution), is the population mean of an exponentialpopulation.


ML E i i E l 1 (E i l)


19/27


Generally, we cant solve algebraically for the ML estimator, sowe have to use a computer to calculate .

To do this for the exponential example, I first create afunction which returns the value of the log-likelihood functionln L(, y). This function has two arguments, but in ML thedata is taken as given, so the second argument doesntchange.> LL.exp = function(theta,y) {

+ n = length(y)+ LL = -n * log(theta) - (1/theta) * sum(y)+ return(LL)

+ }

If the number of parameters in is large, I may not be able to

plot the log-likelihood function very easily but I can solve forthe ML estimator even if I cant visualize the log-likelihoodfunction.

In this example I only have a single parameter (=), so aplot is a useful tool.


ML E i i E l 1 (E i l)


20/27


The sample (n= 1000) was drawn from an exp(.5) distribution, sothe sample mean will be very near 1

2. The log-likelihood evidently

has a maximum around 0.5, but to find out what it is exactly, wewill need to find the max numerically.

0.4 0.6 0.8 1.0

600

500

400

300

ln

L(,y

)


ML E ti ti E l 1 (E ti l)


21/27


The optim function can be used to perform numericaloptimization. Heres an example of using it to solve the lastproblem:> sol = optim(1,fn=LL.exp,gr=NULL,y.data,method="BFGS",control=list(fnscale=-1))

> sol$par

[1] 0.4983282

Heres what the arguments mean:

gr A function providing the gradient is often used to speed upcalculation. The argument gr references that function if it exists(otherwise NULL).

y.data This is a reference to our data set. All unnamed arguments after grare passed through to the function to be optimized: LL.exp.

method There are several options. This is a safe one.

control This is a list of control parameters. By setting fnscale to -1, weare saying we want to maximize rather than minimize (the default).


ML E ti ti E l 1 (E ti l)


22/27

ML Estimation: Example 1 (Exponential)Note that our data doesnt have to be drawn from an exponentialto fit it with an exponential density. We could do the same thingwith data from a U(.25, 1).

0.4 0.6 0.8 1.0

1000

800

600

ln

L(,

y)

> sol = optim(1,fn=LL.exp,gr=NULL,y.data,method="BFGS",control=list(fnscale=-1))

> sol$par

[1] 0.6135531


ML Esti atio E a le 2 (No al)


23/27

ML Estimation: Example 2 (Normal)Similarly, we can estimate parameters of a normal via ML. Wedont need to do much algebra to know that, since the populationparameters are = (, 2), the ML estimates of the parametersshould end up looking like the sample mean and variance. Letscheck.

The likelihood function is

L(1, 2, y) =

ni=1

122 exp

(yi 1)222 or

L(1, 2, y) = 1

22n

exp i(yi 1)2

2

2

The log-likelihood function is

ln L(1, 2, y) = n2

ln(22)

i(yi 1)222


ML Estimation: Example 2 (Normal)


24/27


The partial derivative of the log-likelihood with respect to 1 is

ln L1

= 12

ni=1

(yi 1)

The partial with respect to 2 is

ln L

2= n

22+

1

222

ni=1

(yi 1)2

Its left to you as an exercise to set these two expressions

equal to zero and show that they lead to

1 = y and 2 = 1

n

ni=1

(yi y)2




25/27


Now lets solve this as we usually would, numerically. We startwith a data set consisting of 1000 draws from a N(0, 1). Below arethe log-likelihood function and a plot of contours of thelog-likelihood function. The plot clearly shows a maximum= (1,2) somewhere in the vicinity of (0, 1) so far so good.> LL.norm = function(theta,y) {

+ n = length(y)+ LL = -(n/2) * log(2*pi) - (n/2)*log(theta[2]) -

+ (1/(2*theta[2])) * sum( (y-theta[1])^2 )+ return(LL)

+ }




26/27


t1

t2

14561452

1450

1448 14481446

14441442

1440

1438

1438

1436

1436

1434

1434

1432

1432

1430

1428

1426

1424

0.10 0.05 0.00 0.05 0.10

0.

8

0.

9

1.

0

1.

1

1.

2




27/27


> sol = optim(c(.5,.75),fn=LL.norm,gr=NULL,y.data,+ method="BFGS",control=list(fnscale=-1))

> sol$par

[1] 0.0295824 1.0092104


MicroEconometrics Lecture10

Documents

Transcript of MicroEconometrics Lecture10