Structural Econometrics: Dynamic Discrete Choice Jean-Marc ...uctpjmr/Structural... · Structural...
Transcript of Structural Econometrics: Dynamic Discrete Choice Jean-Marc ...uctpjmr/Structural... · Structural...
Structural Econometrics:Dynamic Discrete Choice
Jean-Marc Robin
Plan
1. Dynamic discrete choice models2. Application: college and career choice
1
Dynamic discrete choice models
See for example the presentation by Wolpin (AER, 1996).
At each date t discrete, an individual has to choose one action among K possible actions.
Let
dk(t) =
����� 1 if k is the chosen action,
0 otherwise.
Let d(t) = (d1(t); :::; dK(t)) or d (t) =PK
k=1 kdk (t) be the choice variable.
Let S(t) 2 S be the state variable (i.e; the information at the beginning of period t when the action ischosen). Assume S discrete: S = fs1; :::; sNg (in any case the computer will require a discrete statespace).
Action k yields payo� Rk(S(t); t).
The state transition probability matrix is
pij(k; t) = Pr fS(t + 1) = sjjS(t) = si; dk(t) = 1g :
2
Strategies
A strategy is a sequence of functions
D(�; t) : S ! f0; 1gK
s 7! D(s; t) = (D1(s; t); :::; DK(s; t))
Individuals seek for the strategy D to maximise the expected discounted sum of future payo�s:
V (S(t); t) = maxD(�;�)
E
"TX�=t
���tKXk=1
Dk (S(� ); � )Rk(S(� ); � )
�����S(t)#:
3
Bellman principle
Write, for s 2 S ,V (s; t) = max fV1(s; t); :::; VK(s; t)g
where Vk(s; t) is the present value if action k is chosen at t when S(t) = s:
Vk(s; t) = Rk(S(t); t) + �E [V (S(t + 1); t + 1)jS(t) = s; dk(t) = 1]
and
Vk(s; T ) = Rk(s; T ):
The optimal strategy is
Dk(s; t) = 1 i� Vk(s; t) = max fV1(s; t); :::; VK(s; t)g
and then
V (s; t) =
KXk=1
Dk(s; t)Vk(s; t):
4
Solution
Start from terminal period T and, for all s 2 S , determine the action which maximises payo�Rk(s; T ):
Dk(s; T ) = 1 i� Rk(s; T ) = max fR1(s; T ); :::; RK(s; T )g
and
V (s; T ) =
KXk=1
Dk(s; T )Rk(s; T ):
Then determine D(s; t) recursively: for all s 2 S ,
Dk(s; t) = 1 i� Vk(s; t) = max fV1(s; t); :::; VK(s; t)g
where, for all s1; :::; sN ;
Vk(si; t) = Rk(si; t) + �E [V (S(t + 1); t + 1)jS(t) = si; dk(t) = 1]
= Rk(s; t) + �
NXj=1
pij(k; t) V (sj; t + 1)| {z }=PK
k=1Dk(sj; t + 1)Vk(sj; t + 1)
Curse of dimensionality: huge number of computations and large memory size required to compute
Vk(s; t) 8k; s; t.
5
Estimation
Parameters: in the payo� functions Rk(s; t) and transition probabilities pij(k; t).
Inference: maximum likelihood or (simulated) method of moments.
Data: individual sequences yh =�xh(th0); d
h(th0); xh(th0 + 1); d
h(th0 + 1); :::; xh(th1); d
h(th1)for indi-
viduals h = 1; :::; H and t 2�th0; t
h0 + 1; :::; t
h1
, where xh(t) 2 fx1; :::; xIg is the observed part of
the state variables, i.e. Sh(t) =�xh(t); "h(t)
�, with the following...
...Assumptions on the process of shocks "h(t):
� "h(t) =�"h1(t); :::; "
hK(t)
�iid;
� Rk(Sh (t) ; t) = Rk(xh (t) ; t) + "hk(t);
� conditional independence:
Pr�xh(t + 1); "h(t + 1)jxh(t); "h(t); dk(t) = 1
= Pr("h(t + 1))
� Pr�xh(t + 1) = xjjxh(t) = xi; dhk(t) = 1
| {z }�pij(k;t)
:
6
Likelihood
The conditional likelihood of yh given xh(th0) is
`(yhjxh(th0)) = Pr�dh(th0)jxh(th0)
� Pr
�xh(th0 + 1)jxh(th0); dh(th0)
� Pr
�dh(th0 + 1)jxh(th0 + 1)
� P
�xh(th0 + 2)jxh(th0 + 1); dh(th0 + 1)
� � � � � Pr
�dh(th1)jxh(th1)
where
Pr�dhk(t) = 1jxh(t)
= Pr
�"h(t) s.t. Dk(x
h(t); "h(t); t) = 1jxh(t):
The conditional likelihood of the sample is
HYh=1
`(yhjxh(th0)):
7
Choice probabilities
Pr�dhk(t) = 1jxh(t) = xi
= Pr
�"hk(t) � "hm(t) + V m(xi; t)� V k(xi; t);8m 6= kjxh(t) = xi
:
where
V k(xi; t) = Rk(xi; t) + �
NXj=1
pij(k; t)V (sj; t + 1);
pij(k; t) = Pr�xh(t + 1) = xjjxh(t) = xi; dhk(t) = 1
;
V (xj; t + 1) = E Kmaxk=1
�V k(xj; t + 1) + "
hk(t + 1)
:
For instance, for (X1; X2) Gaussian,
Emax fX1; X2g = X2 + Emax fX1 �X2; 0g
= m2 + (m1 �m2) �
�m1 �m2
�
�+ �'
�m1 �m2
�
�where � = Std (X1 �X2) =
p�21 + �
22 � 2��1�2.
8
Two stage estimation
One can proceed in two stages to save computer time, although t the cost of some e�ciency loss.
1. Maiximise partial likelihood of state changes:
HYh=1
Pr�xh(th0 + 1)jxh(th0); dh(th0)
Pr�xh(th0 + 2)jxh(th0 + 1); dh(th0 + 1)
� � � Pr
�xh(th1)jxh(th1 � 1); dh(th1 � 1)
;
with respect to parameters of Pr fx(t + 1)jx(t); d(t); tg.
2. Maximise the likelihood of the sequence of decisions:
HYh=1
Pr�dh(th0)jxh(th0)
� � � � � Pr
�dh(th1)jxh(th1)
using the estimated Pr fx(t + 1)jx(t); d(t); tg to compute the present value functions necessaryto calculate choice probabilities.
9
Unobserved heterogeneity
The two-stage estimation procedure does not work if there exists unobserved heterogeneity.
Assume that Sh(t) =�xh(t); "h(t); �h
�where �h 2 f1; :::;Mg indicates a particular way of grouping
individuals. All individuals with the same �h have a speci�c value of the parameters governing payo�
functions and state probabilities.
Let Pr��h = m
= �m, m 2 f1; :::;Mg.
The likelihood becomes
HYh=1
`(yhjxh(th0)) =HYh=1
MXm=1
�m`(yhjxh(th0);m)
!where
`(yhjxh(th0); �h) = Pr�dh(th0)jxh(th0); �h
� Pr
�xh(th0 + 1)jxh(th0); �h; dh(th0)
� Pr
�dh(th0 + 1)jxh(th0 + 1); �h
� Pr
�xh(th0 + 2)jxh(th0 + 1); �h; dh(th0 + 1)
� � � � � Pr
�dh(th1)jxh(th1); �h
:
10
EM algorithm
Let y = (y1; � � �; yH) be a vector of observations. Let z = (z1; � � �; zH) be unobserved covariates. Thelikelihood of (y; z) is f (y; z; �).
Since z is not observed one estimates � by maximixing the integrated likelihood:
f (y; �) =
Zf (y; z; �)�(dz).
This integral may be di�cult to compute and the numerical approximation may yield unstable Newton-
type optimisation algorithms (numerical errors accumulate instead of averaging). The EM algorithm
is often preferable.
The EM algorithm iterates the following steps until numerical convergence (generally slowly)
�(p) = argmax�Q(�j�(p�1));
where
Q(�j�(p�1)) = Ehln f (y; z; �)jy; �(p�1)
i=
Zpnzjy; �(p�1)
oln f (y; z; �)�(dz):
Each iteration increases the likelihood and converges toward a local maximum of the likelihood.
11
EM algorithm: discrete mixtures
Assume zi 2 f1; :::;Mg and �m = Pr fzi = mg.
Then � = (�; �) where � indexes f (yijzi; �) and � = (�1; :::; �M).
We have
f (y; z; �) =HYi=1
f (yi; zi; �) =HYi=1
"MXm=1
�mf (yijzi = m; �)#:
Step E (expectation): Use Bayes rule to compute posterior probabilities:
pnzi = mjyi; �(p�1)
o=
�(p�1)m f (yijzi = m; �(p�1))PM
m=1 �(p�1)m f (yijzi = m; �(p�1))
and
Q(�j�(p�1)) =Zpnzjy; �(p�1)
oln f (y; z; �)�(dz)
=
HXi=1
MXm=1
pnzi = mjyi; �(p�1)
oln [�mf (yijzi = m; �)] :
12
Step M (maximisation): Update � by constrained ML:
�(p) = argmax�
HXi=1
MXm=1
pnzi = mjyi; �(p�1)
oln f (yijzi = m; �);
(i.e. duplicate individual observations K times and a�ect a weight equal to posterior probability
pnzi = mjyi; �(p�1)
o) and update � as
�(p)m =1
H
HXi=1
pnzi = mjyi; �(p�1)
o:
13
Application: education and career choice
See for example the presentation by Keane et Wolpin (JPE, 1997).
Model of education and career choices.
� Data: 11-year panel (National Longitudinal Survey of Youths): cohort of youths aged 16 in 1979and followed until 1990.
� Objective: evaluate policy e�ects such as education subsidies.
� Population studied is a cohort of individuals starting at the age of 16 and retiring at 65.
� Choices: blue collar worker (k = 1), white collar worker (k = 2), military (k = 3), education(k = 4) or inactivity (k = 5).
14
Model
� Payo�s associated to choices k = 1; 2; 3 are the corresponding wages, the log of which are
lnRk(t) = ek(16) + ek1EDUC(t) + ek2EXPk(t)� ek3 [EXPk(t)]2 + "k(t)
where ek(16) is the intercept (initial condition), EDUC(t) is the number of years of education,
EXPk(t) is occupation-k speci�c experience (= nb of years spent working as k; with EXPk(16) =
0).
� Education's instantaneous payo� (or cost if negative):
R4(t) = e4(16)� c11 [EDUC(t) � 12]| {z }HS graduate
� c21 [EDUC(t) � 16]| {z }college graduate
+ "4(t):
� Leisure utility:R5(t) = e5(16) + "5(t):
� State variable: S(t) = (e(16); EDUC(t); EXP (t); "(t)) with8>><>>:e(16) = (e1(16); :::; e5(16)) ;
EXP (t) = (EXP1(t); EXP2(t); EXP3(t)) ;
"(t) = ("1(t); :::; "5(t)) :
15
Model (continued)
� Heterogeneity �
{ four groups m = 1; 2; 3; 4.
{ e(16) = (e1(16); :::; e5(16)) group-speci�c.
{ as EDUC(16) = 9 or 10, assume di�erent proportions of each type given EDUC(16):
Pr f� = mjEDUC(16)g � �m;EDUC(16):
� State probabilities:
{ "(t) = ("1(t); :::; "5(t)) iid and � N (0;), with Cov ("k(t); "`(t)) = 0 for ` or k � 4 (i.e. only"1(t); "2(t); "3(t) corresponding to employment spells are correlated).
{ Education: EDUC(t + 1) = EDUC(t) + d4(t):
{ Exp�erience: EXPk(t + 1) = EXPk(t) + dk(t).
� Value functions:Vk(S(t); t) = Rk(t) + �E [V (S(t + 1); t + 1)jdk(t) = 1]
where "(t + 1) is the only risk factor (not predetermined) in V (S(t + 1); t + 1) given d(t):
16
Value functions
Vk(S(t); t) = Rk(t) + �E [V (S(t + 1); t + 1)jdk(t) = 1]
where "(t + 1) is the only risk factor (not predetermined) in V (S(t + 1); t + 1) given d(t), a
� Pour k = 1; 2; 3, (EXP`(t + 1) = EXP`(t) + 1(` = k); ` = 1; 2; 3;
EDUC(t + 1) = EDUC(t):
� Pour k = 4, (EXP`(t + 1) = EXP`(t); ` = 1; 2; 3;
EDUC(t + 1) = EDUC(t) + 1:
� Pour k = 5, (EXP`(t + 1) = EXP`(t); ` = 1; 2; 3;
EDUC(t + 1) = EDUC(t):
17
Likelihood
� Individual observations: yh(t) =�dh(t); wh(t)
�, t = 16; :::; 26, where dh(t) =
�dh1(t); :::; d
h5(t)�
is occupation choice and wh(t) =P3
k=1 dhk(t)R
hk(t) is current wage (missing if not working).
� Sample likelihood:
L =HYh=1
"HXh=1
�m;EDUCh(16)`h(yh(16); :::; yh(26)jeh(16); EDUCh(16))
#:
� Likelihood for individual h:
`h(yh(16); yh(17); :::; yh(26)jeh(16); EDUCh(16)) =26Yt=16
`h�yh(t)jeh(16); EDUCh(t); EXP h(t)
�:
18
Likelihood (continued)
Likelihood for individual h at time t: `h�yh(t)jeh(16); EDUCh(t); EXP h(t)
�is computed as follows
(we omit conditioning to simplify notations).
Di�erent as general studied above as the wage information tells us about shocks "hk (t).
� Case dh (t) = k 2 f1; 2; 3g: one thus knows that wh(t) = Rhk(t) and Vk(S(t); t) � V`(S(t); t),
` 6= k:
`h�yh(t)
�= Pr
8><>:Vk(Sh(t); t) � V`(Sh(t); t); 8` 6= kjRhk(t) = wh(t)| {z }determines "hk(t)
9>=>;� pdf�Rhk(t) = w
h(t)| {z }
i.e. density of Rhk(t) at observation wh(t)
:
� Other cases: one only knows that Vk(Sh(t); t) � V`(Sh(t); t), ` 6= k:
`h�yh(t)
�= Pr
�Vk(S
h(t); t) � V`(Sh(t); t); 8` 6= k:
19
Likelihood (continued)
Given eh(16); EDUCh(t); EXP h(t),
pdf�Rhk(t) = w
h(t)=
1
wh(t)
1
�k'
0BBBB@"hk(t)z }| {
lnwh(t)� ek(16)� ek1EDUC(t)� ek2EXPk(t) + ek3 [EXPk(t)]2
�k
1CCCCAwhere �2k = Var("k(t)), and
Pr�Vk(S
h(t); t) � V`(Sh(t); t); 8` 6= k j "hk(t)= Pr
�"h` (t) � g`(t); 8` 6= k j "k(t)
where
g`(t) = ln�Vk(S
h(t); t)� �E�V (Sh(t + 1); t + 1)jd`(t) = 1
���e`(16)� e`1EDUC(t)� e`2EXPk(t) + e`3 [EXPk(t)]2 ; ` = 1; 2; 3 ;
g4(t) = Vk(Sh(t); t)� e4(16)� c11 [EDUC(t) � 12]� c21 [EDUC(t) � 16] ;
g5(t) = Vk(Sh(t); t)� e5(16):
One has to compute the cdf of a vector of 4 normal r.v.'s. (Computation simpli�ed by the fact that
"h4 (t) and "h5 (t) are assumed independent and independent of "
h1 (t) ; "
h2 (t) and "
h3 (t).
20
Lastly, Pr�Vk(S
h(t); t) � V`(Sh(t); t); 8` 6= kcan be computed by numerical integration of Pr
�Vk(S
h(t); t) � V`(Sh(t); t); 8` 6= k j "hk(t)
w.r.t. "hk(t).
21
Results
See article.
The �t is excellent.
They �nd a very limited e�ect of college tuition subsidies (exogenous change in c2).
22