Discrete Choice Modeling William Greene Stern School of Business New York University Lab Sessions.
Spatial Discrete Choice Models Professor William Greene Stern School of Business, New York...
-
Upload
lionel-oneal -
Category
Documents
-
view
221 -
download
1
Transcript of Spatial Discrete Choice Models Professor William Greene Stern School of Business, New York...
Spatial Discrete Choice Models
Professor William GreeneStern School of Business, New York University
Spatial Correlation
Per Capita Income in Monroe County, New York, USA
Spatially Autocorrelated Data
The Hypothesis of Spatial Autocorrelation
Spatial Discrete Choice Modeling: Agenda
Linear Models with Spatial Correlation Discrete Choice Models Spatial Correlation in Nonlinear Models
· Basics of Discrete Choice Models· Maximum Likelihood Estimation
Spatial Correlation in Discrete Choice· Binary Choice· Ordered Choice· Unordered Multinomial Choice· Models for Counts
Linear Spatial Autocorrelation
ii
( ) ( ) , N observations on a spatially
arranged variable
'contiguity matrix;' 0
spatial a
x i W x i ε
W W
W must be specified in advance. It is not estimated.
2
1
2 -1
utocorrelation parameter, -1 < < 1.
E[ ]= Var[ ]=
( ) [ ]
E[ ]= Var[ ]= [( ) ( )]
ε 0, ε I
x i I W ε = Spatial "moving average" form
x i, x I W I W
Testing for Spatial Autocorrelation
Spatial Autocorrelation
2
2 2
.
E[ ]= Var[ ]=
E[ ]=
Var[ ]
y Xβ Wε
ε| X 0, ε| X I
y| X Xβ
y| X = WW
A Generalized Regression Model
Spatial Autoregression in a Linear Model
2
1
1 1
1
2 -1
+ .
E[ ]= Var[ ]=
[ ] ( )
[ ] [ ]
E[ ]=[ ]
Var[ ] [( ) ( )]
y Wy Xβ ε
ε| X 0, ε| X I
y I W Xβ ε
I W Xβ I W ε
y| X I W Xβ
y| X = I W I W
Complications of the Generalized Regression Model
Potentially very large N – GPS data on agriculture plots
Estimation of . There is no natural residual based estimator
Complicated covariance structure – no simple transformations
Panel Data Application
it i it
E.g., N countries, T periods (e.g., gasoline data)
y c
= N observations at time t.
Similar assumptions
Candidate for SUR or Spatial Autocorrelation model.
it
t t t
x β
ε Wε v
Spatial Autocorrelation in a Panel
Alternative Panel Formulations
i,t t 1 i it
i
Pure space-recursive - dependence pertains to neighbors in period t-1
y [ ] regression +
Time-space recursive - dependence is pure autoregressive and on neighbors
in period t-1
y
Wy
,t i,t-1 t 1 i it
i,t i,t-1 t i it
y + [ ] regression +
Time-space simultaneous - dependence is autoregressive and on neighbors
in the current period
y y + [ ] regression +
Time-space dynamic -
Wy
Wy
i,t i,t-1 t i t 1 i it
dependence is autoregressive and on neighbors
in both current and last period
y y + [ ] + [ ] regression + Wy Wy
Analytical Environment
Generalized linear regression Complicated disturbance covariance
matrix Estimation platform
· Generalized least squares· Maximum likelihood estimation when
normally distributed disturbances (still GLS)
Discrete Choices Land use intensity in Austin, Texas –
Intensity = 1,2,3,4 Land Usage Types in France, 1,2,3 Oak Tree Regeneration in
Pennsylvania Number = 0,1,2,… (Many zeros)
Teenagers physically active = 1 or physically inactive = 0, in Bay Area, CA.
Discrete Choice Modeling
Discrete outcome reveals a specific choice
Underlying preferences are modeled
Models for observed data are usually not conditional means· Generally, probabilities of outcomes· Nonlinear models – cannot be estimated by any
type of linear least squares
Discrete Outcomes
Discrete Revelation of Underlying Preferences· Binary choice between two alternatives· Unordered choice among multiple
alternatives· Ordered choice revealing underlying
strength of preferences
Counts of Events
Simple Binary Choice: Insurance
Redefined Multinomial Choice
Fly Ground
Multinomial Unordered Choice - Transport Mode
Health Satisfaction (HSAT)
Self administered survey: Health Care Satisfaction? (0 – 10)
Continuous Preference Scale
Ordered Preferences at IMDB.com
Counts of Events
Modeling Discrete Outcomes
“Dependent Variable” typically labels an outcome· No quantitative meaning· Conditional relationship to covariates
No “regression” relationship in most cases
The “model” is usually a probability
Simple Binary Choice: Insurance
Decision: Yes or No = 1 or 0Depends on Income, Health, Marital Status, Gender
Multinomial Unordered Choice - Transport Mode
Decision: Which Type, A, T, B, C. Depends on Income, Price, Travel Time
Health Satisfaction (HSAT)
Self administered survey: Health Care Satisfaction? (0 – 10)
Outcome: Preference = 0,1,2,…,10Depends on Income, Marital Status, Children, Age, Gender
Counts of Events
Outcome: How many events at each location = 0,1,…,10Depends on Season, Population, Economic Activity
Nonlinear Spatial Modeling
Discrete outcome yit = 0, 1, …, J for some finite or infinite (count case) J.· i = 1,…,n· t = 1,…,T
Covariates xit .
Conditional Probability (yit = j) = a function of xit.
Two Platforms
Random Utility for Preference Models Outcome reveals underlying utility· Binary: u* = ’x y = 1 if u* > 0· Ordered: u* = ’x y = j if j-1 < u* < j
· Unordered: u*(j) = ’xj , y = j if u*(j) > u*(k)
Nonlinear Regression for Count Models Outcome is governed by a nonlinear regression· E[y|x] = g(,x)
Probit and Logit Models
Prob(y 1 or 0| ) = F( ) or [1- F( )]x x xi i i iθ θ
Implied Regression Function
Estimated Binary Choice Models:The Results Depend on F(ε)
LOGIT PROBIT EXTREME VALUE
Variable Estimate t-ratio Estimate t-ratio Estimate t-ratio
Constant -0.42085 -2.662 -0.25179 -2.600 0.00960 0.078
X1 0.02365 7.205 0.01445 7.257 0.01878 7.129
X2 -0.44198 -2.610 -0.27128 -2.635 -0.32343 -2.536
X3 0.63825 8.453 0.38685 8.472 0.52280 8.407
Log-L -2097.48 -2097.35 -2098.17
Log-L(0) -2169.27 -2169.27 -2169.27
+ 1 (X1+1) + 2 (X2) + 3 X3 (1 is positive)
Effect on Predicted Probability of an Increase in X1
Estimated Partial Effects vs. Coefficients
Applications: Health Care UsageGerman Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsVariables in the file areData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). (Downloaded from the JAE Archive)
DOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years FEMALE = 1 for female headed household, 0 for male EDUC = years of education
An Estimated Binary Choice Model
An Estimated Ordered Choice Model
An Estimated Count Data Model
210 Observations on Travel Mode Choice
CHOICE ATTRIBUTES CHARACTERISTICMODE TRAVEL INVC INVT TTME GC HINCAIR .00000 59.000 100.00 69.000 70.000 35.000TRAIN .00000 31.000 372.00 34.000 71.000 35.000BUS .00000 25.000 417.00 35.000 70.000 35.000CAR 1.0000 10.000 180.00 .00000 30.000 35.000AIR .00000 58.000 68.000 64.000 68.000 30.000TRAIN .00000 31.000 354.00 44.000 84.000 30.000BUS .00000 25.000 399.00 53.000 85.000 30.000CAR 1.0000 11.000 255.00 .00000 50.000 30.000AIR .00000 127.00 193.00 69.000 148.00 60.000TRAIN .00000 109.00 888.00 34.000 205.00 60.000BUS 1.0000 52.000 1025.0 60.000 163.00 60.000CAR .00000 50.000 892.00 .00000 147.00 60.000AIR .00000 44.000 100.00 64.000 59.000 70.000TRAIN .00000 25.000 351.00 44.000 78.000 70.000BUS .00000 20.000 361.00 53.000 75.000 70.000CAR 1.0000 5.0000 180.00 .00000 32.000 70.000
An Estimated Unordered Choice Model
Maximum Likelihood EstimationCross Section Case
Binary Outcome
Random Utility: y* = +
Observed Outcome: y = 1 if y* > 0,
0 if y* 0.
Probabilities: P(y=1|x) = Prob(y* > 0| )
x
x
= Prob( > - )
P(y=0|x) = Prob(y* 0| )
= Prob( - )
Likelihood for the sample = joint probability
x
x
x
i i1
i i1
= Prob(y=y| )
Log Likelihood = logProb(y=y| )
x
x
n
i
n
i
Cross Section Case
1 1 1 1
2 2 2 2
1 1
2 2
| or >
| or > Prob Prob
... ...
| or >
Prob( or > )
Prob( or > ) =
...
Prob( or >
x x
x x
x x
x
x
x
n n n n
n
y j
y j
y j
)
We operate on the marginal probabilities of n observations
n
Log Likelihoods for Binary Choice Models
1
2
Logl( | )= logF 2 1
Probit
1 F(t) = (t) exp( t / 2)dt
2
(t)dt
Logit
exp(t) F(t) = (t) =
1 exp(t)
X,y xn
i ii
t
t
y
Spatially Correlated ObservationsCorrelation Based on Unobservables
1 1 1 1 1
2 2 2 2 2 2
u u 0
u u 0 ~ f ,
... ... ... ...
u u 0
In the cross section case, = .
= the usual spatial weight matrix .
x
x
x
W WW
WW I
n n n n n
y
y
y
Now, it is a full
matrix. The joint probably is a single n fold integral.
Spatially Correlated ObservationsCorrelated Utilities
* *1 1 1 11 1
* *12 2 2 22 2
* *
... ...... ...
In the cross section case
= the usual spatial weight matrix .
x x
x x
x x
W I W
Wn n n nn n
y y
y y
y y
, = . Now, it is a full
matrix. The joint probably is a single n fold integral.
W I
Log Likelihood
In the unrestricted spatial case, the log likelihood is one term,
LogL = log Prob(y1|x1, y2|x2, … ,yn|xn)
In the discrete choice case, the probability will be an n fold integral, usually for a normal distribution.
LogL for an Unrestricted BC Model
1
1 1 1 2 12 1 1 1
2 2 1 2 21 2 2 2
1 1 2 2
1 ...
1 ...LogL( | )=log ...
... ... ... ... ... ...
... 1
1 if y = 0 and
x xX,y
n
n n
n nn
n n n n n n n
i i
q q q w q q w
q q q w q q wd
q q qw q q w
q
+1 if y = 1.
One huge observation - n dimensional normal integral.
Not feasible for any reasonable sample size.
Even if computable, provides no device for estimating sampling standard errors.
i
Solution Approaches for Binary Choice
Distinguish between private and social shocks and use pseudo-ML
Approximate the joint density and use GMM with the EM algorithm
Parameterize the spatial correlation and use copula methods
Define neighborhoods – make W a sparse matrix and use pseudo-ML
Others …
Pseudo Maximum LikelihoodSmirnov, A., “Modeling Spatial Discrete Choice,” Regional Science and Urban Economics, 40, 2010.
1 1
1
0
Spatial Autoregression in Utilities
* * , 1( * ) for all n individuals
* ( ) ( )
( ) ( ) assumed convergent
=
= + where
t
t
y Wy X y y 0
y I W X I W
I W W
A
D A-D
1
= diagonal elements
*
Private Social
Then
aProb[y 1 or 0| ] F (2 1) , p
nj ij j
i ii
yd
D
y AX D A-D
Suppose individuals ignore the social "shocks."
xX
robit or logit.
Pseudo Maximum Likelihood
Assumes away the correlation in the reduced form
Makes a behavioral assumption
Requires inversion of (I-W)
Computation of (I-W) is part of the optimization process - is estimated with .
Does not require multidimensional integration (for a logit model, requires no integration)
GMMPinske, J. and Slade, M., (1998) “Contracting in Space: An Application of Spatial Statistics to Discrete Choice Models,” Journal of Econometrics, 85, 1, 125-154.Pinkse, J. , Slade, M. and Shen, L (2006) “Dynamic Spatial Discrete Choice Using One Step GMM: An Application to Mine Operating Decisions”, Spatial Economic Analysis, 1: 1, 53 — 99.
1
*= + , = +
= [ - ]
= u
Cross section case: =0
Probit Model: FOC for estimation of is based on the
ˆ generalized residuals ui
y W u
I W u
A
Xθ ε
1
= y [ | ]
( ( )) ( ) =
( )[1 ( )]
Spatially autocorrelated case: Moment equations are still
valid. Complication is computing the variance of the
i i
n i i iii
i i
E y
y
x xx 0
x x
moment
equations, which requires some approximations.
GMM
1
*= + , = +
= [ - ]
= u
Autocorrelated Case: 0
Probit Model: FOC for estimation of is based on the
ˆ generalized residuals u
y W u
I W u
A
Xθ ε
1
= y [ | ]
( ) ( ) =
1( ) ( )
i i i
i ii
n ii iiii
i i
ii ii
E y
ya a
a a
x x
z 0x x
Requires at least K+1 instrumental variables.
GMM Approach
Spatial autocorrelation induces heteroscedasticity that is a function of
Moment equations include the heteroscedasticity and an additional instrumental variable for identifying .
LM test of = 0 is carried out under the null hypothesis that = 0.
Application: Contract type in pricing for 118 Vancouver service stations.
Copula Method and ParameterizationBhat, C. and Sener, I., (2009) “A copula-based closed-form binary logit choice modelfor accommodating spatial correlation across observational units,” Journal of Geographical Systems, 11, 243–272
* *
1 2
Basic Logit Model
y , y 1[y 0] (as usual)
Rather than specify a spatial weight matrix, we assume
[ , ,..., ] have an n-variate distribution.
Sklar's Theorem represents the joint distribut
i i i i i
n
x
1 1 2 2
ion in terms
of the continuous marginal distributions, ( ) and a copula
function C[u = ( ) ,u ( ) ,...,u ( ) | ]i
n n
Copula Representation
Model
Likelihood
Parameterization
Other Approaches
Case (1992): Define “regions” or neighborhoods. No correlation across regions. Produces essentially a panel data probit model.
Beron and Vijverberg (2003): Brute force integration using GHK simulator in a probit model.
Others. See Bhat and Sener (2009).
Case A (1992) Neighborhood influence and technological change. Economics 22:491–508Beron KJ, Vijverberg WPM (2004) Probit in a spatial context: a monte carlo analysis. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Berlin
Ordered Probability Model
1
1 2
2 3
J-1 J
j-1
y* , we assume contains a constant term
y 0 if y* 0
y = 1 if 0 < y*
y = 2 if < y*
y = 3 if < y*
...
y = J if < y*
In general: y = j if < y*
βx x
j
-1 o J j-1 j,
, j = 0,1,...,J
, 0, , j = 1,...,J
Outcomes for Health Satisfaction
A Spatial Ordered Choice ModelWang, C. and Kockelman, K., (2009) Bayesian Inference for Ordered Response Data with a Dynamic Spatial Ordered Probit Model, Working Paper, Department of Civil and Environmental Engineering, Bucknell University.
* *1
* *1
Core Model: Cross Section
y , y = j if y , Var[ ] 1
Spatial Formulation: There are R regions. Within a region
y u , y = j if y
Spatial he
i i i i j i j i
ir ir i ir ir j ir j
βx
βx2
2
1 2 1
teroscedasticity: Var[ ]
Spatial Autocorrelation Across Regions
= + , ~ N[ , ]
= ( - ) ~ N[ , {( - ) ( - )} ]
The error distribution depends on 2 para
ir r
v
v
u Wu v v 0 I
u I W v 0 I W I W2meters, and
Estimation Approach: Gibbs Sampling; Markov Chain Monte Carlo
Dynamics in latent utilities added as a final step: y*(t)=f[y*(t-1)].
v
OCM for Land Use Intensity
OCM for Land Use Intensity
Estimated Dynamic OCM
Unordered Multinomial Choice
Underlying Random Utility for Each Alternative
U(i,j) = , i = individual, j = alternative
Preference Revelation
Y(i) = j if and only if U(i,j) > U(i,k
j ij ij
Core Random Utility Model
x
1
1
) for all k j
Model Frameworks
Multinomial Probit: [ ,..., ] ~N[0, ]
Multinomial Logit: [ ,..., ] ~ type I extreme valueJ
J iid
Multinomial Unordered Choice - Transport Mode
Decision: Which Type, A, T, B, C. Depends on Income, Price, Travel Time
Spatial Multinomial ProbitChakir, R. and Parent, O. (2009) “Determinants of land use changes: A spatial multinomial probit approach, Papers in Regional Science, 88, 2, 328-346.
1
Utility Functions, land parcel i, usage type j, date t
U(i,j,t)=
Spatial Correlation at Time t
Modeling Framework: Normal / Multinomial Probit
Estimation: MCMC
jt ijt ik ijt
n
ij il lkl
x
w
- Gibbs Sampling
Modeling Counts
Canonical Model
Poisson Regression
y = 0,1,...
exp( ) Prob[y = j|x] =
!
Conditional Mean = exp( x)
Signature Feature: Equidispersion
Usual Alternative: Various forms of Negative Binomial
Spatial E
j
j
1
ffect: Filtered through the mean
= exp( x + )
=
i i i
n
i im m imw
Rathbun, S and Fei, L (2006) “A Spatial Zero-Inflated Poisson Regression Model for Oak Regeneration,” Environmental Ecology Statistics, 13, 2006, 409-426