MicroEconometrics Lecture10

download MicroEconometrics Lecture10

of 27

Transcript of MicroEconometrics Lecture10

  • 8/13/2019 MicroEconometrics Lecture10

    1/27

    Economics 440.618 Microeconometrics

    Michael T. Sandfort

    Department of Applied Economics

    The Johns Hopkins University

    November 19, 2013

  • 8/13/2019 MicroEconometrics Lecture10

    2/27

    Limited Dependent Variable Models

    The models we have looked at so far are appropriate ify is a

    continuous quantitative variable, e.g., demand, price, output,wage rate, etc.

    But there are many interesting economic decisions which aretough to characterize continously:

    For a worker: accept or reject a job offer?

    For a firm: enter (if out) or exit (if in) a market? For a woman: whether to have a child? If so, how many? For a consumer: which brand of peanut butter to buy?

    Today, we are going to start discussing the econometrics ofthis kind of data starting with a very simple model: the binary

    choice model. The binary choice model is a good entry point to talking

    about more complicated limited dependent variable models,which is where we will end the course.

    Economics 440.618 Microeconometrics 2

  • 8/13/2019 MicroEconometrics Lecture10

    3/27

    Binary Response Model

    We say that a dependent variable ywe want to model isbinary if it takes on only two values. We typically recode these

    two values as 1 (yes, true, success, accepted,survived, etc.) and 0 (no, false, failure, rejected,died, etc.).

    Whenever the variable we want to model is binary, its naturalto think in terms of probabilities.

    What is the probability that an individual with characteristics xowns a home?

    If the persons characteristics were x rather than x, howwould that affect the probability that they own a home?

    Data for such dependent variables looks just like a dummy

    variable that one might use as an explanatory variable in aregression.

    But the implications of having it on the left hand side of ourstructural equation rather than the right hand side are

    significant.Economics 440.618 Microeconometrics 3

  • 8/13/2019 MicroEconometrics Lecture10

    4/27

    Binary Response Model

    Assuming we have a random sample, the sample mean of ourbinary variable is an unbiased estimate of the unconditional

    probability of success (y= 1). That is,

    E

    1

    n

    i

    yi

    =

    1

    nnE(y) = E(y)

    = 1 Pr(y= 1) + 0 Pr(y= 0)= Pr(y= 1)

    Estimating the unconditional probability of success is fine, butit wont allow us to explore many interesting policy questions.

    What is the overall rate of home ownership vs. Whatwould be the effect on the home ownership rate of rescindingthe home mortgage interest tax deduction?

    Can we use OLS to address a problem like this?

    Economics 440.618 Microeconometrics 4

  • 8/13/2019 MicroEconometrics Lecture10

    5/27

    OLS Estimation: The Linear Probability Model (LPM)

    A naive way to approach this problem is to simply apply theclassical regression model, so

    y= x+u

    with OLS.1 and OLS.2.

    Using the same logic as on the previous slide, we know thatPr(y= 1|x) =E(y|x) = x. Since probabilities must sum to one, it is also true that

    Pr(y= 0|x) = 1 Pr(y= 1|x) = 1 x. This is abinary response model where the probability of

    success is a linear function of x hence the name.

    In the linear probability model, the incremental change inprobability due to a discrete change xj in xj isPr(y= 1|x) =jxj.

    Economics 440.618 Microeconometrics 5

  • 8/13/2019 MicroEconometrics Lecture10

    6/27

    LPM: An ExampleConsider again the data on womens wages weve looked at before.Rather than dropping the zero-hours observations, well try to

    directly model the labor force participation decision.Our dependent variable will be y= inlf which takes the value 1 ifthe woman was in the labor force and 0 otherwise. We fit a modelof labor force participation:> print(coeftest(res))

    t test of coefficients:

    Estimate Std. Error t value Pr(>|t|)

    (Intercept) 0.58551922 0.15417800 3.7977 0.0001579 ***

    nwifeinc -0.00340517 0.00144849 -2.3508 0.0189908 *

    educ 0.03799530 0.00737602 5.1512 3.317e-07 ***

    exper 0.03949239 0.00567267 6.9619 7.376e-12 ***

    expersq -0.00059631 0.00018479 -3.2270 0.0013059 **age -0.01609081 0.00248468 -6.4760 1.709e-10 ***

    kidslt6 -0.26181047 0.03350579 -7.8139 1.889e-14 ***

    kidsge6 0.01301223 0.01319596 0.9861 0.3244154

    ---

    Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

    This looks OK, but consider the following...Economics 440.618 Microeconometrics 6

  • 8/13/2019 MicroEconometrics Lecture10

    7/27

    LPM: An ExampleOn the left is what the data and fitted regression line look like withjust inlf as a function ofexper. On the right is the marginal

    effect of an increase in experience on the probability of labor forceparticipation:

    10 0 10 20 30 40 50

    1.

    0

    0.5

    0.0

    0.

    5

    1.0

    1.

    5

    2.

    0

    exper

    inlf

    10 0 10 20 30 40 50

    0.0

    0.5

    1.0

    1.5

    exper

    r

    Note that the predicted probability Pr(y= 1|exper)>1 forexper>10 and that Prexceeds one (100%) for exper>30.

    Economics 440.618 Microeconometrics 7

  • 8/13/2019 MicroEconometrics Lecture10

    8/27

    Concerns About LPMSome of the problems are fairly evident, but worth highlighting (LHS figure) Some plausible combinations of independent

    variables give fitted/predicted probabilities less than zero orgreater than one. Since a probability must lie between zeroand one, this can be embarrassing.

    (RHS figure) It doesnt really make sense to say that aprobability measure is linearly related to a continuously

    varying independent variable over a large range. Changes inexperience exper>30 are within the range of the data, andlead to changes in probability over 100%.

    Heteroskedasticity is baked into the model in the sense that

    V(u|x) = Pr(y= 1|x)[1 x]2

    +Pr(y= 0|x)[0 x]2

    = x[1 x]2 + (1 x)[x]2= x[1 x]

    which is clearly a function of x. Thus, use of a robust

    variance-covariance matrix is compulsory.Economics 440.618 Microeconometrics 8

  • 8/13/2019 MicroEconometrics Lecture10

    9/27

    Concerns About LPMFinally, the residuals from the LPM are clearly non-normal, as seenin the figure. A kernel density plot (a continuous approximation of

    a histogram) of the data is shown in red. The scaled normaldensity is shown ingreenfor reference.

    Histogram and Kernel Density for LPM Residuals

    Residuals

    Density

    1.0 0.5 0.0 0.5 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Economics 440.618 Microeconometrics 9

  • 8/13/2019 MicroEconometrics Lecture10

    10/27

    An Alternative Model

    To address these problems in the LPM, consider an alternativebinary response model of the form

    Pr(y= 1|x) =G(1+ 2x2+ + kxk) =G(x)

    where G(w) is a function taking values strictly between zeroand one (0

  • 8/13/2019 MicroEconometrics Lecture10

    11/27

    The Logit Model

    Two CDFs G(w) are used inmost applications. The first isthelogitCDF, which has the

    form

    G(x) = exp(x)

    1 + exp(x)

    = (x)

    3 2 1 0 1 2 3

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    The Logit CDF (w)

    w

    (w)

    Economics 440.618 Microeconometrics 11

  • 8/13/2019 MicroEconometrics Lecture10

    12/27

    The Probit Model

    The second is thestandardnormalCDF, which has the form

    G(x) = x

    (w)dw

    = (x)

    where

    (w) =

    1

    2 expw2

    2

    3 2 1 0 1 2 3

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    The Probit (Normal) CDF (w)

    w

    (w)

    Economics 440.618 Microeconometrics 12

  • 8/13/2019 MicroEconometrics Lecture10

    13/27

    How to estimate?

    Both of these functions increase rapidly near the origin and

    increase much less rapidly toward the extremes. That is, the partial/marginal effect of an increase in x

    changes, depending on the level of x.

    This is a desirable feature, because constant marginal effects

    was one of the objections to the LPM. Nonetheless, it raises a new challenge, because this is no

    longer a model that we can estimate by OLS.

    The approach usually taken to estimation of index functionmodels like the logit and probit is maximum likelihood (ML).

    ML estimation is a technique with very broad applications, sowell spend the rest of this class talking about the MLestimator and then apply it specifically to the logit and probit.

    Economics 440.618 Microeconometrics 13

  • 8/13/2019 MicroEconometrics Lecture10

    14/27

    Maximum Likelihood (ML) Estimation

    The principle underlying ML estimation is fairly simple toarticulate. Consider the figure below where the histogram from a

    sample y= (y1, . . . , yn) from a univariate random variable y isshown. The sample looks like it could have come from a normaldistribution, so the same figure shows several normal densities(, 2), with different and 2. From which density does thedata most likely come?

    Histogram of y

    y

    D

    ensity

    4 2 0 2 4

    0.0

    0.2

    0

    .4

    0.6

    0.8

    1.0

    Economics 440.618 Microeconometrics 14

  • 8/13/2019 MicroEconometrics Lecture10

    15/27

    Maximum Likelihood (ML) Estimation

    ML estimation is just a way of formalizing this procedure, so

    you can use the technique in situations with many moreparameters situations where your ability to visualize may failyou.

    Formally, suppose a population in which the random variabley is distributed according to some distribution with density

    f(y,), where is a vector of parameters. Also suppose that we have a random sample y= (y1, . . . , yn)

    drawn from that population but that we dont know .

    Evidently, the data is more likely (think about the figure

    again) to have been drawn from a density with one set ofparameters (say, ) rather than another (say, ).

    Our objective is to use the data y to develop our bestestimate of.

    Economics 440.618 Microeconometrics 15

  • 8/13/2019 MicroEconometrics Lecture10

    16/27

    Maximum Likelihood (ML) Estimation If the draws in y are taken independently from the population

    with true parameter f(y,), then the probability (or

    likelihood) of observing the sample isL(|y1, y2, . . . , yn) =f(y1,)f(y2,) f(y1, )

    since the probability of two or more independent events is justthe product of their probabilities.

    The function L(, y) is known as thelikelihood function. Since the maximum of any function g(z) is the same as the

    maximum ofh(g(z)) ifh is a strictly increasing function, weoften work with thelog-likelihood function

    ln L(, y) =

    ni=1

    ln f(yi, )

    rather than the log likelihood function itself. The value which maximizes L(, y) (or lnL(, y)) is known

    as themaximum likelihood estimateof.Economics 440.618 Microeconometrics 16

    ( )

  • 8/13/2019 MicroEconometrics Lecture10

    17/27

    ML Estimation: Example 1 (Exponential)

    As a first example, consider using ML to estimate the singleparameter from data y known to come from an exponentialdistribution. The density of the exponential distribution is

    f(y, ) =1

    e

    y

    So the likelihood function L(, y) is

    L(, y) =

    1

    e

    y1

    1

    e

    y2

    1

    e

    yn

    And the log-likelihood function is

    ln L(, y) = n ln() 1

    ni=1

    yi

    Economics 440.618 Microeconometrics 17

    ( )

  • 8/13/2019 MicroEconometrics Lecture10

    18/27

    ML Estimation: Example 1 (Exponential) Because the log-likelihood function is so simple, we can

    actually solve it algebraically. Usually, thats not possible.

    Taking the derivative of the log-likelihood function withrespect to gives

    dln L(, y)

    d = n

    +

    iyi

    2

    Setting the derivative equal to zero solves for an interior

    maximizer of the log-likelihood function.

    n

    +

    iyi

    2= 0

    or= 1n iyi.

    In other words, the ML estimate of for an exponentialdistribution is just = y, the sample mean.

    This makes sense because (if you look up the exponentialdistribution), is the population mean of an exponentialpopulation.

    Economics 440.618 Microeconometrics 18

    ML E i i E l 1 (E i l)

  • 8/13/2019 MicroEconometrics Lecture10

    19/27

    ML Estimation: Example 1 (Exponential)

    Generally, we cant solve algebraically for the ML estimator, sowe have to use a computer to calculate .

    To do this for the exponential example, I first create afunction which returns the value of the log-likelihood functionln L(, y). This function has two arguments, but in ML thedata is taken as given, so the second argument doesntchange.> LL.exp = function(theta,y) {

    + n = length(y)+ LL = -n * log(theta) - (1/theta) * sum(y)+ return(LL)

    + }

    If the number of parameters in is large, I may not be able to

    plot the log-likelihood function very easily but I can solve forthe ML estimator even if I cant visualize the log-likelihoodfunction.

    In this example I only have a single parameter (=), so aplot is a useful tool.

    Economics 440.618 Microeconometrics 19

    ML E i i E l 1 (E i l)

  • 8/13/2019 MicroEconometrics Lecture10

    20/27

    ML Estimation: Example 1 (Exponential)

    The sample (n= 1000) was drawn from an exp(.5) distribution, sothe sample mean will be very near 1

    2. The log-likelihood evidently

    has a maximum around 0.5, but to find out what it is exactly, wewill need to find the max numerically.

    0.4 0.6 0.8 1.0

    600

    500

    400

    300

    ln

    L(,y

    )

    Economics 440.618 Microeconometrics 20

    ML E ti ti E l 1 (E ti l)

  • 8/13/2019 MicroEconometrics Lecture10

    21/27

    ML Estimation: Example 1 (Exponential)

    The optim function can be used to perform numericaloptimization. Heres an example of using it to solve the lastproblem:> sol = optim(1,fn=LL.exp,gr=NULL,y.data,method="BFGS",control=list(fnscale=-1))

    > sol$par

    [1] 0.4983282

    Heres what the arguments mean:

    gr A function providing the gradient is often used to speed upcalculation. The argument gr references that function if it exists(otherwise NULL).

    y.data This is a reference to our data set. All unnamed arguments after grare passed through to the function to be optimized: LL.exp.

    method There are several options. This is a safe one.

    control This is a list of control parameters. By setting fnscale to -1, weare saying we want to maximize rather than minimize (the default).

    Economics 440.618 Microeconometrics 21

    ML E ti ti E l 1 (E ti l)

  • 8/13/2019 MicroEconometrics Lecture10

    22/27

    ML Estimation: Example 1 (Exponential)Note that our data doesnt have to be drawn from an exponentialto fit it with an exponential density. We could do the same thingwith data from a U(.25, 1).

    0.4 0.6 0.8 1.0

    1000

    800

    600

    ln

    L(,

    y)

    > sol = optim(1,fn=LL.exp,gr=NULL,y.data,method="BFGS",control=list(fnscale=-1))

    > sol$par

    [1] 0.6135531

    Economics 440.618 Microeconometrics 22

    ML Esti atio E a le 2 (No al)

  • 8/13/2019 MicroEconometrics Lecture10

    23/27

    ML Estimation: Example 2 (Normal)Similarly, we can estimate parameters of a normal via ML. Wedont need to do much algebra to know that, since the populationparameters are = (, 2), the ML estimates of the parametersshould end up looking like the sample mean and variance. Letscheck.

    The likelihood function is

    L(1, 2, y) =

    ni=1

    122 exp

    (yi 1)222 or

    L(1, 2, y) = 1

    22n

    exp i(yi 1)2

    2

    2

    The log-likelihood function is

    ln L(1, 2, y) = n2

    ln(22)

    i(yi 1)222

    Economics 440.618 Microeconometrics 23

    ML Estimation: Example 2 (Normal)

  • 8/13/2019 MicroEconometrics Lecture10

    24/27

    ML Estimation: Example 2 (Normal)

    The partial derivative of the log-likelihood with respect to 1 is

    ln L1

    = 12

    ni=1

    (yi 1)

    The partial with respect to 2 is

    ln L

    2= n

    22+

    1

    222

    ni=1

    (yi 1)2

    Its left to you as an exercise to set these two expressions

    equal to zero and show that they lead to

    1 = y and 2 = 1

    n

    ni=1

    (yi y)2

    Economics 440.618 Microeconometrics 24

    ML Estimation: Example 2 (Normal)

  • 8/13/2019 MicroEconometrics Lecture10

    25/27

    ML Estimation: Example 2 (Normal)

    Now lets solve this as we usually would, numerically. We startwith a data set consisting of 1000 draws from a N(0, 1). Below arethe log-likelihood function and a plot of contours of thelog-likelihood function. The plot clearly shows a maximum= (1,2) somewhere in the vicinity of (0, 1) so far so good.> LL.norm = function(theta,y) {

    + n = length(y)+ LL = -(n/2) * log(2*pi) - (n/2)*log(theta[2]) -

    + (1/(2*theta[2])) * sum( (y-theta[1])^2 )+ return(LL)

    + }

    Economics 440.618 Microeconometrics 25

    ML Estimation: Example 2 (Normal)

  • 8/13/2019 MicroEconometrics Lecture10

    26/27

    ML Estimation: Example 2 (Normal)

    t1

    t2

    14561452

    1450

    1448 14481446

    14441442

    1440

    1438

    1438

    1436

    1436

    1434

    1434

    1432

    1432

    1430

    1428

    1426

    1424

    0.10 0.05 0.00 0.05 0.10

    0.

    8

    0.

    9

    1.

    0

    1.

    1

    1.

    2

    Economics 440.618 Microeconometrics 26

    ML Estimation: Example 2 (Normal)

  • 8/13/2019 MicroEconometrics Lecture10

    27/27

    ML Estimation: Example 2 (Normal)

    > sol = optim(c(.5,.75),fn=LL.norm,gr=NULL,y.data,+ method="BFGS",control=list(fnscale=-1))

    > sol$par

    [1] 0.0295824 1.0092104

    Economics 440.618 Microeconometrics 27