Structural Econometrics: Dynamic Discrete Choice Jean-Marc ...uctpjmr/Structural... · Structural...

Structural Econometrics:Dynamic Discrete Choice

Jean-Marc Robin

Plan

1. Dynamic discrete choice models2. Application: college and career choice

1

Dynamic discrete choice models

See for example the presentation by Wolpin (AER, 1996).

At each date t discrete, an individual has to choose one action among K possible actions.

Let

dk(t) =

�� 1 if k is the chosen action,

0 otherwise.

Let d(t) = (d1(t); :::; dK(t)) or d (t) =PK

k=1 kdk (t) be the choice variable.

Let S(t) 2 S be the state variable (i.e; the information at the beginning of period t when the action ischosen). Assume S discrete: S = fs1; :::; sNg (in any case the computer will require a discrete statespace).

Action k yields payo� Rk(S(t); t).

The state transition probability matrix is

pij(k; t) = Pr fS(t + 1) = sjjS(t) = si; dk(t) = 1g :

2

Strategies

A strategy is a sequence of functions

D(�; t) : S ! f0; 1gK

s 7! D(s; t) = (D1(s; t); :::; DK(s; t))

Individuals seek for the strategy D to maximise the expected discounted sum of future payo�s:

V (S(t); t) = maxD(�;�)

E

"TX�=t

��tKXk=1

Dk (S(� ); � )Rk(S(� ); � )

��S(t)#:

3

Bellman principle

Write, for s 2 S ,V (s; t) = max fV1(s; t); :::; VK(s; t)g

where Vk(s; t) is the present value if action k is chosen at t when S(t) = s:

Vk(s; t) = Rk(S(t); t) + �E [V (S(t + 1); t + 1)jS(t) = s; dk(t) = 1]

and

Vk(s; T ) = Rk(s; T ):

The optimal strategy is

Dk(s; t) = 1 i� Vk(s; t) = max fV1(s; t); :::; VK(s; t)g

and then

V (s; t) =

KXk=1

Dk(s; t)Vk(s; t):

4

Solution

Start from terminal period T and, for all s 2 S , determine the action which maximises payo�Rk(s; T ):

Dk(s; T ) = 1 i� Rk(s; T ) = max fR1(s; T ); :::; RK(s; T )g

and

V (s; T ) =

KXk=1

Dk(s; T )Rk(s; T ):

Then determine D(s; t) recursively: for all s 2 S ,

Dk(s; t) = 1 i� Vk(s; t) = max fV1(s; t); :::; VK(s; t)g

where, for all s1; :::; sN ;

Vk(si; t) = Rk(si; t) + �E [V (S(t + 1); t + 1)jS(t) = si; dk(t) = 1]

= Rk(s; t) + �

NXj=1

pij(k; t) V (sj; t + 1)| {z }=PK

k=1Dk(sj; t + 1)Vk(sj; t + 1)

Curse of dimensionality: huge number of computations and large memory size required to compute

Vk(s; t) 8k; s; t.

5

Estimation

Parameters: in the payo� functions Rk(s; t) and transition probabilities pij(k; t).

Inference: maximum likelihood or (simulated) method of moments.

Data: individual sequences yh =�xh(th0); d

h(th0); xh(th0 + 1); d

h(th0 + 1); :::; xh(th1); d

h(th1)for indi-

viduals h = 1; :::; H and t 2�th0; t

h0 + 1; :::; t

h1

, where xh(t) 2 fx1; :::; xIg is the observed part of

the state variables, i.e. Sh(t) =�xh(t); "h(t)

�, with the following...

...Assumptions on the process of shocks "h(t):

� "h(t) =�"h1(t); :::; "

hK(t)

�iid;

� Rk(Sh (t) ; t) = Rk(xh (t) ; t) + "hk(t);

� conditional independence:

Pr�xh(t + 1); "h(t + 1)jxh(t); "h(t); dk(t) = 1

= Pr("h(t + 1))

� Pr�xh(t + 1) = xjjxh(t) = xi; dhk(t) = 1

| {z }�pij(k;t)

:

6

Likelihood

The conditional likelihood of yh given xh(th0) is

`(yhjxh(th0)) = Pr�dh(th0)jxh(th0)

� Pr

�xh(th0 + 1)jxh(th0); dh(th0)

� Pr

�dh(th0 + 1)jxh(th0 + 1)

� P

�xh(th0 + 2)jxh(th0 + 1); dh(th0 + 1)

� � � � � Pr

�dh(th1)jxh(th1)

where

Pr�dhk(t) = 1jxh(t)

= Pr

�"h(t) s.t. Dk(x

h(t); "h(t); t) = 1jxh(t):

The conditional likelihood of the sample is

HYh=1

`(yhjxh(th0)):

7

Choice probabilities

Pr�dhk(t) = 1jxh(t) = xi

= Pr

�"hk(t) � "hm(t) + V m(xi; t)� V k(xi; t);8m 6= kjxh(t) = xi

:

where

V k(xi; t) = Rk(xi; t) + �

NXj=1

pij(k; t)V (sj; t + 1);

pij(k; t) = Pr�xh(t + 1) = xjjxh(t) = xi; dhk(t) = 1

;

V (xj; t + 1) = E Kmaxk=1

�V k(xj; t + 1) + "

hk(t + 1)

:

For instance, for (X1; X2) Gaussian,

Emax fX1; X2g = X2 + Emax fX1 �X2; 0g

= m2 + (m1 �m2) �

�m1 �m2

�

�+ �'

�m1 �m2

�

�where � = Std (X1 �X2) =

p�21 + �

22 � 2��1�2.

8

Two stage estimation

One can proceed in two stages to save computer time, although t the cost of some e�ciency loss.

1. Maiximise partial likelihood of state changes:

HYh=1

Pr�xh(th0 + 1)jxh(th0); dh(th0)

Pr�xh(th0 + 2)jxh(th0 + 1); dh(th0 + 1)

� � � Pr

�xh(th1)jxh(th1 � 1); dh(th1 � 1)

;

with respect to parameters of Pr fx(t + 1)jx(t); d(t); tg.

2. Maximise the likelihood of the sequence of decisions:

HYh=1

Pr�dh(th0)jxh(th0)

� � � � � Pr

�dh(th1)jxh(th1)

using the estimated Pr fx(t + 1)jx(t); d(t); tg to compute the present value functions necessaryto calculate choice probabilities.

9

Unobserved heterogeneity

The two-stage estimation procedure does not work if there exists unobserved heterogeneity.

Assume that Sh(t) =�xh(t); "h(t); �h

�where �h 2 f1; :::;Mg indicates a particular way of grouping

individuals. All individuals with the same �h have a speci�c value of the parameters governing payo�

functions and state probabilities.

Let Pr��h = m

= �m, m 2 f1; :::;Mg.

The likelihood becomes

HYh=1

`(yhjxh(th0)) =HYh=1

MXm=1

�m`(yhjxh(th0);m)

!where

`(yhjxh(th0); �h) = Pr�dh(th0)jxh(th0); �h

� Pr

�xh(th0 + 1)jxh(th0); �h; dh(th0)

� Pr

�dh(th0 + 1)jxh(th0 + 1); �h

� Pr

�xh(th0 + 2)jxh(th0 + 1); �h; dh(th0 + 1)

� � � � � Pr

�dh(th1)jxh(th1); �h

:

10

EM algorithm

Let y = (y1; � � �; yH) be a vector of observations. Let z = (z1; � � �; zH) be unobserved covariates. Thelikelihood of (y; z) is f (y; z; �).

Since z is not observed one estimates � by maximixing the integrated likelihood:

f (y; �) =

Zf (y; z; �)�(dz).

This integral may be di�cult to compute and the numerical approximation may yield unstable Newton-

type optimisation algorithms (numerical errors accumulate instead of averaging). The EM algorithm

is often preferable.

The EM algorithm iterates the following steps until numerical convergence (generally slowly)

�(p) = argmax�Q(�j�(p�1));

where

Q(�j�(p�1)) = Ehln f (y; z; �)jy; �(p�1)

i=

Zpnzjy; �(p�1)

oln f (y; z; �)�(dz):

Each iteration increases the likelihood and converges toward a local maximum of the likelihood.

11

EM algorithm: discrete mixtures

Assume zi 2 f1; :::;Mg and �m = Pr fzi = mg.

Then � = (�; �) where � indexes f (yijzi; �) and � = (�1; :::; �M).

We have

f (y; z; �) =HYi=1

f (yi; zi; �) =HYi=1

"MXm=1

�mf (yijzi = m; �)#:

Step E (expectation): Use Bayes rule to compute posterior probabilities:

pnzi = mjyi; �(p�1)

o=

�(p�1)m f (yijzi = m; �(p�1))PM

m=1 �(p�1)m f (yijzi = m; �(p�1))

and

Q(�j�(p�1)) =Zpnzjy; �(p�1)

oln f (y; z; �)�(dz)

=

HXi=1

MXm=1


oln [�mf (yijzi = m; �)] :

12

Step M (maximisation): Update � by constrained ML:

�(p) = argmax�

HXi=1

MXm=1


oln f (yijzi = m; �);

(i.e. duplicate individual observations K times and a�ect a weight equal to posterior probability


o) and update � as

�(p)m =1

H

HXi=1


o:

13

Application: education and career choice

See for example the presentation by Keane et Wolpin (JPE, 1997).

Model of education and career choices.

� Data: 11-year panel (National Longitudinal Survey of Youths): cohort of youths aged 16 in 1979and followed until 1990.

� Objective: evaluate policy e�ects such as education subsidies.

� Population studied is a cohort of individuals starting at the age of 16 and retiring at 65.

� Choices: blue collar worker (k = 1), white collar worker (k = 2), military (k = 3), education(k = 4) or inactivity (k = 5).

14

Model

� Payo�s associated to choices k = 1; 2; 3 are the corresponding wages, the log of which are

lnRk(t) = ek(16) + ek1EDUC(t) + ek2EXPk(t)� ek3 [EXPk(t)]2 + "k(t)

where ek(16) is the intercept (initial condition), EDUC(t) is the number of years of education,

EXPk(t) is occupation-k speci�c experience (= nb of years spent working as k; with EXPk(16) =

0).

� Education's instantaneous payo� (or cost if negative):

R4(t) = e4(16)� c11 [EDUC(t) � 12]| {z }HS graduate

� c21 [EDUC(t) � 16]| {z }college graduate

+ "4(t):

� Leisure utility:R5(t) = e5(16) + "5(t):

� State variable: S(t) = (e(16); EDUC(t); EXP (t); "(t)) with8>><>>:e(16) = (e1(16); :::; e5(16)) ;

EXP (t) = (EXP1(t); EXP2(t); EXP3(t)) ;

"(t) = ("1(t); :::; "5(t)) :

15

Model (continued)

� Heterogeneity �

{ four groups m = 1; 2; 3; 4.

{ e(16) = (e1(16); :::; e5(16)) group-speci�c.

{ as EDUC(16) = 9 or 10, assume di�erent proportions of each type given EDUC(16):

Pr f� = mjEDUC(16)g � �m;EDUC(16):

� State probabilities:

{ "(t) = ("1(t); :::; "5(t)) iid and � N (0;), with Cov ("k(t); "`(t)) = 0 for ` or k � 4 (i.e. only"1(t); "2(t); "3(t) corresponding to employment spells are correlated).

{ Education: EDUC(t + 1) = EDUC(t) + d4(t):

{ Exp�erience: EXPk(t + 1) = EXPk(t) + dk(t).

� Value functions:Vk(S(t); t) = Rk(t) + �E [V (S(t + 1); t + 1)jdk(t) = 1]

where "(t + 1) is the only risk factor (not predetermined) in V (S(t + 1); t + 1) given d(t):

16

Value functions

Vk(S(t); t) = Rk(t) + �E [V (S(t + 1); t + 1)jdk(t) = 1]

where "(t + 1) is the only risk factor (not predetermined) in V (S(t + 1); t + 1) given d(t), a

� Pour k = 1; 2; 3, (EXP`(t + 1) = EXP`(t) + 1(` = k); ` = 1; 2; 3;

EDUC(t + 1) = EDUC(t):

� Pour k = 4, (EXP`(t + 1) = EXP`(t); ` = 1; 2; 3;

EDUC(t + 1) = EDUC(t) + 1:

� Pour k = 5, (EXP`(t + 1) = EXP`(t); ` = 1; 2; 3;

EDUC(t + 1) = EDUC(t):

17

Likelihood

� Individual observations: yh(t) =�dh(t); wh(t)

�, t = 16; :::; 26, where dh(t) =

�dh1(t); :::; d

h5(t)�

is occupation choice and wh(t) =P3

k=1 dhk(t)R

hk(t) is current wage (missing if not working).

� Sample likelihood:

L =HYh=1

"HXh=1

�m;EDUCh(16)`h(yh(16); :::; yh(26)jeh(16); EDUCh(16))

#:

� Likelihood for individual h:

`h(yh(16); yh(17); :::; yh(26)jeh(16); EDUCh(16)) =26Yt=16

`h�yh(t)jeh(16); EDUCh(t); EXP h(t)

�:

18

Likelihood (continued)

Likelihood for individual h at time t: `h�yh(t)jeh(16); EDUCh(t); EXP h(t)

�is computed as follows

(we omit conditioning to simplify notations).

Di�erent as general studied above as the wage information tells us about shocks "hk (t).

� Case dh (t) = k 2 f1; 2; 3g: one thus knows that wh(t) = Rhk(t) and Vk(S(t); t) � V`(S(t); t),

` 6= k:

`h�yh(t)

�= Pr

8><>:Vk(Sh(t); t) � V`(Sh(t); t); 8` 6= kjRhk(t) = wh(t)| {z }determines "hk(t)

9>=>;� pdf�Rhk(t) = w

h(t)| {z }

i.e. density of Rhk(t) at observation wh(t)

:

� Other cases: one only knows that Vk(Sh(t); t) � V`(Sh(t); t), ` 6= k:

`h�yh(t)

�= Pr

�Vk(S

h(t); t) � V`(Sh(t); t); 8` 6= k:

19

Likelihood (continued)

Given eh(16); EDUCh(t); EXP h(t),

pdf�Rhk(t) = w

h(t)=

1

wh(t)

1

�k'

0BBBB@"hk(t)z }| {

lnwh(t)� ek(16)� ek1EDUC(t)� ek2EXPk(t) + ek3 [EXPk(t)]2

�k

1CCCCAwhere �2k = Var("k(t)), and

Pr�Vk(S

h(t); t) � V`(Sh(t); t); 8` 6= k j "hk(t)= Pr

�"h` (t) � g`(t); 8` 6= k j "k(t)

where

g`(t) = ln�Vk(S

h(t); t)� �E�V (Sh(t + 1); t + 1)jd`(t) = 1

��e`(16)� e`1EDUC(t)� e`2EXPk(t) + e`3 [EXPk(t)]2 ; ` = 1; 2; 3 ;

g4(t) = Vk(Sh(t); t)� e4(16)� c11 [EDUC(t) � 12]� c21 [EDUC(t) � 16] ;

g5(t) = Vk(Sh(t); t)� e5(16):

One has to compute the cdf of a vector of 4 normal r.v.'s. (Computation simpli�ed by the fact that

"h4 (t) and "h5 (t) are assumed independent and independent of "

h1 (t) ; "

h2 (t) and "

h3 (t).

20

Lastly, Pr�Vk(S

h(t); t) � V`(Sh(t); t); 8` 6= kcan be computed by numerical integration of Pr

�Vk(S

h(t); t) � V`(Sh(t); t); 8` 6= k j "hk(t)

w.r.t. "hk(t).

21

Results

See article.

The �t is excellent.

They �nd a very limited e�ect of college tuition subsidies (exogenous change in c2).

22

Structural Econometrics: Dynamic Discrete Choice Jean-Marc ...uctpjmr/Structural... · Structural...

Documents

Transcript of Structural Econometrics: Dynamic Discrete Choice Jean-Marc ...uctpjmr/Structural... · Structural...