Download - 20. Count Data - faculty.econ.ucdavis.edu

Transcript
Page 1: 20. Count Data - faculty.econ.ucdavis.edu

20. Count DataA. Colin Cameron Pravin K. Trivedi Copyright 2006

These slides were prepared in 2002.They cover material similar to Chapter 20 of our subsequent bookMicroeconometrics: Methods and Applications, Cambridge Univer-sity Press, 2005.

1

Page 2: 20. Count Data - faculty.econ.ucdavis.edu

OUTLINE

1. Introduction and Example2. Poisson Regression- MLE and Quasi-MLE

3. Richer Fully Parametric Cross-section models- Negative binomial- Hurdle or two-part and with-zeros- Finite mixtures and latent class

4. Complications- Time Series, Multivariate- Panel (emphasized here)- Sample Selection, Endogeneity- Semiparametric, Bayesian

2

Page 3: 20. Count Data - faculty.econ.ucdavis.edu

1A. INTRODUCTION

� Count data models are for dependent variable y =0, 1, 2, ...

� Two leading examples:� y: Number of doctor visits (usually cross-section)x: health status, age, gender, ....

� y: Number of patent applications (usually panel)x: current and lagged R&D expenditure

� Here emphasize cross-section data and short panels.

� Many approaches and issues are general nonlinear model issues.

� Pecking order: Continuous, tobit, binary/multinomial, duration, counts.

3

Page 4: 20. Count Data - faculty.econ.ucdavis.edu

1B. HEALTH EXAMPLE

� Many surveys such as U.S. National Health Interview Survey (NHIS) measurehealth use as counts as people have better recall of counts than of dollars spent.

� Australian Health Survey 1977-78 has many such measures.e.g. Number of Doctor Visits in past 2 weeks n =5190

# Visits 0 1 2 3 4 5 6 7 8 9Freq 4141 782 174 30 24 9 12 12 5 1Rel Freq .798 .151 .033 .006 .005 .002 .002 .002 .000 .001

4

Page 5: 20. Count Data - faculty.econ.ucdavis.edu

1B. HEALTH EXAMPLE (continued)

� Interest is in role of health insurance on health service use.

� Regressors grouped into four categories:

� Socioeconomic: SEX, AGE, AGESQ, INCOME

� Health insurance status indicators:LEVYPLUS, FREEPOOR, FREEREPA, LEVY (omitted)

� Recent health status measures: ILLNESS, ACTDAYS

� Long-term health status measures:HSCORE, CHCOND1, CHCOND2.

5

Page 6: 20. Count Data - faculty.econ.ucdavis.edu

2. POISSON REGRESSION: SUMMARY

� Poisson regression is straightforward, many packages do poisson regression,and coef�cients are easily interpreted as semi-elasticities.

� Do Poisson rather than OLS with dependent variable y or ln y (with adjust-ment for ln 0) or variance-stabilizing transformations such aspy:

� Poisson MLE consistent provided only that E[yjx] = exp(x0�):But when do Poisson make sure standard errors etc. are robust toV[yjx] 6= E[yjx]:

6

Page 7: 20. Count Data - faculty.econ.ucdavis.edu

2A. POISSON MODEL

� From stochastic process theory, natural model for counts is y � Poisson(�):

� Density f (y) = e���y=y!

� Moments E[y] = � V[y] = �

� Regression model lets the Poisson rate parameter vary across individuals withx in way to ensure � > 0. Exponential function achieves this.

� = E[yjx] = exp(x0�):

� This common starting fully parametric model is too restrictive.

7

Page 8: 20. Count Data - faculty.econ.ucdavis.edu

2B. POISSON MLE

� MLE is straightforward given data independent over i.

� The ML f.o.c. are that the residual is orthogonal to the regressors.As a result consistency does not require Poisson distribution (see below).Xn

i=1(yi � exp(x0i�))xi = 0:

� Detailsf (y) = e���y=y! and � = exp(x0�)) ln f (y) = � exp(x0�)+yx0� � ln y!

) L(�) =Pn

i=1 f� exp(x0i�)+yix0i� � ln yi!g) @L=@� =

Pni=1 f� exp(x0i�)xi + yix0ig :

8

Page 9: 20. Count Data - faculty.econ.ucdavis.edu

2B. POISSON MLE: DOCTOR VISITS REGRESSION

Variable Coeff Robust Average dE[yjx]=dxjst.error* dE[yjx]=dxj at x=�x

ONE �2:224 :190 � �SEX :157 :056� :047 :035AGE (years=100) 1:056 1:001 � �AGESQ �:849 1:078 � �INCOME ($10; 000) �:205 :088� �:062 �:047ILLNESS :187 :018� :056 :043ACTDAYS :127 :005� :038 :029HSCORE :030 :010� :009 :007CHCOND1 (not limit) :114 :066 :026 :026CHCOND2 (not limit) :141 :083 :032 :032

* Note that the usual ML standard errors are not used as explained below.

9

Page 10: 20. Count Data - faculty.econ.ucdavis.edu

2B. POISSON MLE: DOCTOR VISITS REGRESSION (continued)

� Dependent variable is number of doctor visits.Regressors also include LEVYPLUS, FREEPOOR, FREEREPA

� Robust se is standard error assuming V[yjx] = �� E[yjx] (see below)

� Average effect over the sample of change in xj is1

n

Xn

i=1@E[yijxi]=@xij =

1

n

Xn

i=1exp(x0i�)� �j

� Effect of change in xj evaluated at x = �x is@E[yjx]=@xjjx=�x = exp(�x

0�)� �j

10

Page 11: 20. Count Data - faculty.econ.ucdavis.edu

2C. POISSON MODEL: COEFFICIENT INTERPRETATION

� Key result for E[yjx] = exp(x0�) is that:

@E[yjx]@xj

= exp(x0�)� �j = E[yjx]� �j

1. Conditional mean is strictly monotonic increasing (or decreasing) inxj according to the sign of �j.

2. Coef�cients are semi-elasticities:�j is proportionate change in conditional mean when xij changes by one unit.

3. Like all single-index models, if one coef�cient is double another, then effectof one-unit change of associated regressor is double that of other.

11

Page 12: 20. Count Data - faculty.econ.ucdavis.edu

2C. POISSON MODEL: COEFFICIENT INTERPRETATION (cont.)As an example of coef�cient interpretation consider the following.

� DVISITS = number of doctor visits.� ACTDAYS = number of days of reduced activity.

� Poisson regression of DVISITS on ACTDAYS yieldsE[DVISITSjACTDAYS] = exp(�1:529 + 0:158 � ACTDAYS)

� So one more days of reduced activity leads toa 15.8 percent increase in doctor visits (calculus method)or 100� [exp(0:158)� 1] = 17:1 percent increase (noncalculus).

12

Page 13: 20. Count Data - faculty.econ.ucdavis.edu

2D. POISSON QUASI-MLE

� What are properties of Poisson MLE if density is misspeci�ed?

� Poisson MLE is consistent provided only that E[yjx] = exp(x0�).Not a general ML result. Holds in just a few models.

� Still need to correct standard errors if overdispersion (variance > mean) orunderdispersion (variance < mean). Possible methods:� 1. MLE s.e. Assume Poisson, i.e. variance equals mean. Wrong.� 2. GLM Robust s.e. Assume variance = � times mean and calculate �.� 3. White robust. Assume no functional form for the variance.

� Data usually overdispersed, so 1. is wrong.Use 2. or 3. to get robust standard errors.

13

Page 14: 20. Count Data - faculty.econ.ucdavis.edu

2D. POISSON QUASI-MLE: CONSISTENCY

� The MLE f.o.c. are Xn

i=1(yi � exp(x0i�))xi = 0:

� So MLE is consistent ifE[yijxi] = exp(x0i�):

� Thus consistency requires �only� correct conditional mean!� Property shared by generalized linear models based on linear exponentialfamily: normal, binomial, bernoulli, gamma, exponential, Poisson.

� Generalized linear models is standard framework in statistics for nonlinearcross-section regression, including counts.

� Econometrics instead uses either ML/quasi-ML or GMM.

14

Page 15: 20. Count Data - faculty.econ.ucdavis.edu

2D. POISSON QUASI-MLE: ROBUST STANDARD ERRORS

� Correct (robust) standard errors for Poisson quasi-MLE.

� Let �i = exp(x0i�) and �2i = V[yijxi].� Then V[b�] = (Pi �ixix

0i)�1 �P

i �2ixix

0i

�(P

i �ixix0i)�1 :

� If �2i = �i get usual Poisson MLE variance (P

i �ixix0i)�1.

� If �2i = ��i then get � (P

i �ixix0i)�1. b� = (n� k)�1Pi(yi � b�i)2xi

Usually � > 1) Poisson MLE overstates t statistics.� If �2i unspeci�ed then use White robust with �2i replaced by (yi � b�)2.

15

Page 16: 20. Count Data - faculty.econ.ucdavis.edu

2E: POISSON: SUMMARY

� Poisson regression is straightforward, many packages do this and coef�cientsare easily interpreted as semi-elasticities.

� Do Poisson rather than OLS with dependent variable y or ln y (with adjust-ment for ln 0) or variance-stabilizing transformations such aspy:

� Poisson MLE consistent provided only that E[yjx] = exp(x0�):But when do Poisson make sure standard errors etc. are robust toV[yjx] 6= E[yjx]:

16

Page 17: 20. Count Data - faculty.econ.ucdavis.edu

OUTLINE

1. Introduction and Example2. Poisson Regression- MLE and Quasi-MLE

3. Richer Fully Parametric Cross-section models- Negative binomial- Hurdle or two-part and with-zeros- Finite mixtures and latent class

4. Complications- Time Series, Multivariate- Panel (emphasized here)- Sample Selection, Endogeneity- Semiparametric, Bayesian

17

Page 18: 20. Count Data - faculty.econ.ucdavis.edu

3. RICHER PARAMETRIC MODELS

� Data frequently exhibit �non-Poisson� features:� Overdispersion: conditional variance exceeds conditional mean, whereasPoisson imposes equality.

� Excess zeros: higher frequency of zeros than predicted by Poisson withgiven mean.

� Truncation from left: small counts excluded, e.g. 0.� Censoring from right: counts larger than some speci�ed integer aregrouped.

� This provides motivation for richer parametric models than basic Poisson.

� Some still have E[yjx] = exp(x0�). So only ef�ciency gains are issue.Others have different conditional mean in which case usual Poisson QMLE isinconsistent.

18

Page 19: 20. Count Data - faculty.econ.ucdavis.edu

3A. NEGATIVE BINOMIAL MODEL

� Negative binomial (Negbin 2) permits overdispersion.

f (yj�; �) = �(y + ��1)

�(y + 1)�(��1)

���1

��1 + �

���1��

��1 + �

�y:

� Same mean E[yjx] = � = exp(x0�):� Different variance

E[yjx] = � + ��2 = exp(x0�) + �(exp(x0�))2:� Estimate by ML.� In practice little ef�ciency gain over Poisson with robust standard errors.

19

Page 20: 20. Count Data - faculty.econ.ucdavis.edu

3A. NEGATIVE BINOMIAL MODEL: DOCTOR VISITS

Variable Poisson Negbin2 Poisson Negbin2/ Coeff / se Coeff Coeff st.error st.errorONE �2:224 �2:190 :190 :222SEX :157 :217 :056� :066�

AGE (years=100) 1:056 �:216 1:001 1:233AGESQ �:849 :609 1:078 1:380INCOME ($10; 000) �:205 �:142 :088� :098�

ILLNESS :187 :214 :018� :026�

ACTDAYS :127 :144 :005� :008�

HSCORE :030 :038 :010� :014�

CHCOND1 (not limit) :114 :099 :066 :077CHCOND2 (not limit) :141 :190 :083 :095� � 1:077 � :098�

20

Page 21: 20. Count Data - faculty.econ.ucdavis.edu

3B. MIXTURE MODELS

� Mixture motivation for the negative binomial model is to assumeyj� � Poisson (�)

where � = �� is the product of two components:� observed individual heterogeneity � = exp(x0�)� unobserved individual heterogeneity � � Gamma[1; �]

� Integrating out h(yj�) =Rf (yj�; �)g(�)d� gives

yj� � Negative Binomial [�; � + ��2]:

21

Page 22: 20. Count Data - faculty.econ.ucdavis.edu

3B. MIXTURE MODELS (continued)

� A wide range of models, called mixture models, can be generated by specify-ing different distributions of �:e.g. Poisson-Inverse Gaussian.

� Even if no closed form solution can estimate using� numerical integration e.g. Gaussian quadrature, or� monte carlo integration e.g. maximum simulated likelihood

h(yj�) =Zf (yj�; �)g(�)d� ' 1

S

SXs=1

f (yj�; �(s));

where �(s), s = 1; :::; S are S independent draws from g(�) and S !1:

22

Page 23: 20. Count Data - faculty.econ.ucdavis.edu

3C. LEFT-TRUNCATION AT ZERO

� Sampling rule is such that observe only positive counts.

� Untruncated density is f (yjx;�) e.g. Negbin2.

� Truncated density is

f (yjx;�;y � 0) = f (yjx;�)Pr[y � 0jx;�] =

f (yjx;�)[1� f (0jx;�)]:

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.

23

Page 24: 20. Count Data - faculty.econ.ucdavis.edu

3D. RIGHT-CENSORING AT c

� Sampling rule is that observe only 0, 1, 2, ..., c� 1, c or more.

� Uncensored density is f (yjx;�) and cdf is F (yjx;�) e.g. Negbin2.

� Censored density is �f (yjx;�) y � c� 11� F (cjx;�) y = c

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.

24

Page 25: 20. Count Data - faculty.econ.ucdavis.edu

3E. HURDLE MODEL or TWO-PART MODEL

� Suppose process for zeros differs from that for nonzeros.

� Density is

f (yjx1;x1;�1;�2) =

8<: f1(yjx1;�1) y = 01� f1(0jx1;�1)1� f2(0jx2;�2)

� f2(yjx2;�2) y � 1

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.� Hurdles negative binomial often works well.

25

Page 26: 20. Count Data - faculty.econ.ucdavis.edu

3F. WITH-ZEROS MODEL

� Suppose there is extra reason for zeros.

� Density isf (yjx1;x1;�1;�2)

=

�f1(0jx1;�1) + [1� f1(0jx1;�1)]� �f2(0jx2;�2) y = 0[1� f1(0jx1;�1)]� �f2(yjx2;�2) y � 1

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.� Not used much in econometrics.

26

Page 27: 20. Count Data - faculty.econ.ucdavis.edu

3F. FINITE MIXTURES MODEL

� Density is weighted sum of two (or more) densities.

� Density isf (yjx1;x1;�1;�2; �1) = �1f1(yjx1;�1) + (1� �1)f2(yjx2;�2):

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.

� Permits �exible models e.g. bimodal from Poissons.� Can be viewed as a �nite mixture model.

27

Page 28: 20. Count Data - faculty.econ.ucdavis.edu

3G. LATENT CLASS MODEL

� Observation is drawn from one of two (or more) densities, where we don'tknow which density drawn from.

� Let d1 = 1 if type 1 and d1 = 0 otherwiseand d2 = 1 if type 2 and d2 = 0 otherwise

� Density is

f (yjx1;x1;�1;�2; �1; �2) =2Yj=1

[�jfj(yjxj;�j)]dj:

� Estimate by ML using EM algorithm as dj not observed.� Nice interpretation e.g. �sick� type and �healthy� type and people haveprobability of being drawn from either type.

� Similar to unobserved heterogeneity in duration data models.

28

Page 29: 20. Count Data - faculty.econ.ucdavis.edu

3H. MODEL EVALUATION

� Formal tests for overdispersion or underdispersion exist.� Various R-squareds for count data models have been proposed.� For complete data on y the choice between fully parametric approachand moment-based estimators depends on whether want to predict countprobabilities rather than just the mean.

� For fully parametric models� Choice between nested models using likelihood ratio tests.� Choice between non-nested mixture models using Akaike's informationcriterion and extensions.

� Calculate a predicted frequency distribution as the average over observationsof the predicted probabilities for each count. Compare this to the observedfrequency distribution.

29

Page 30: 20. Count Data - faculty.econ.ucdavis.edu

OUTLINE

1. Introduction and Example2. Poisson Regression- MLE and Quasi-MLE

3. Richer Fully Parametric Cross-section models- Negative binomial- Hurdle or two-part and with-zeros- Finite mixtures and latent class

4. Complications- Time Series, Multivariate- Panel (emphasized here)- Sample Selection, Endogeneity- Semiparametric, Bayesian

30

Page 31: 20. Count Data - faculty.econ.ucdavis.edu

4A. TIME SERIES DATA

� Examples are number of strikes and number of trades of a given stock in aone-hour period.

� Many different approaches are possible� Integer valued ARMA: e.g. INAR(1) is yt = � � yt�1 + "twhere � � yt�1 is the number of successes in yt�1 trials, � is probability ofsuccess in one trial, "t is Poisson.

� Autoregressive: e.g. AR(1) is yt � Poisson(�yt�1)with adjustment if yt�1 = 0:

� Serially-correlated error models� State-space models: yt � Poisson(�t) and �t = g(�t�1)� Hidden-Markov models: Different models in different regimes with Markovtransition probabilities.

� Discrete ARMA models.

31

Page 32: 20. Count Data - faculty.econ.ucdavis.edu

4B. MULTIVARIATE DATA

� Example is number of doctor visits and number of hospital stays.� Multivariate Poisson and NB exist but are too restrictive.� GMM approach generalizes SUR to variance a multiple of mean.e.g. E[yjijxj] = exp(x0ji�j) for j = 1; 2,and V[yjijxj] = �j exp(x0ji�j)and Cov[y1i; y2ijxj] = � exp(x01i�1)

1=2 exp(x01i�2)1=2

� Parametric approach induces correlation through common latent variable.e.g. yjijxj �Poisson(exp(x0ji�j + �i)) where �i � g(�).Estimation is by simulated ML if there is no closed form solution.

32

Page 33: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL DATA

� Number of patents applications by company in several years.

� Now have (yit;xit), i = 1; :::; n; t = 1; :::; T:

� Consider only short panel where T is small and n!1.

33

Page 34: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL REVIEW: LINEAR MODEL FOR PANEL DATA

� Model with individual-speci�c effect is

yit = x0it�+�i + "it:

� Different people have different unobserved intercept �i.

� We want to consistently estimate slope parameters �.

34

Page 35: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL REVIEW: LINEAR MODEL - RANDOM EFFECTS

� The approach in most applied statistics� �i is independent of regressors with mean 0 and variance �2�.� Then do feasible GLS to get ef�cient estimates.� Or even do OLS but make sure get correct standard errors that control forwithin-individual clustering.

� Can extend to richer random effects models.

35

Page 36: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL REVIEW: LINEAR MODEL - FIXED EFFECTS

� The approach in econometrics.� �i may be correlated with regressors.� e.g. High �i means high unobserved propensity to see doctor.May also mean likely to have generous insurance.

� More fundamental problem: OLS and GLS inconsistent.� Solution is to difference out �i

36

Page 37: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL REVIEW: LINEAR MODEL - FIXED EFFECTS (cont)

� Either look at deviations from individual meani.e. Deviation from doctor visits this year from individual's average

yit � �yi = (xit � �xi)0� + ("it � �"i)

� Or look at deviations from last year for individuali.e. Deviation from doctor visits this year from individual's average

yit � yi;t�1 = (xit � xi;t�1)0� + ("it � "i;t�1)

37

Page 38: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL DATA: POISSON MODEL

� Poisson panel model isf (yitjxit;�; �i) � Poisson[�it = �i�it]

� Poisson[�it = �i exp(x0it�)]

� Poisson[�it = exp(ln�i + x0it�)]

where �i is unobserved and possibly correlated with xit.

� So the usual mean �it is rescaled by a time invariant multiple �i.

� The two key issues are� correct standard errors allowing for clustering via �i� consistent estimates of � if �i is correlated with xit.

38

Page 39: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL DATA: POISSON MOMENT-BASED ESTIMATION

� Assume regressors xit are strictly exogenous, soE[yitjxi1; : : : ;xiT ; �i] = �i�it:

� Average over t for given iE[�yijxi1; : : : ;xiT ; �i] = �i��i

� SoE��yit � (�it=��i)�yi

�jxi1;:::;xiT

�= 0:

� ThusE

�xit

�yit �

�it��i�yi

��= 0:

� b�GMM solves the corresponding sample moment conditionsnXi=1

TXt=1

xit

�yit �

�it��i�yi

�= 0; where �it = exp(x0it�):

39

Page 40: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL DATA: POISSON MOMENT BASED-ESTIMATION (contin-ued)

� Similar to linear model except instead of work with the difference (yit� �yi) weconsider the quasi-difference (yit � [�it=��i]�yi).

� Similar qualitative conclusions to linear model� Consistency of b�GMM requires only correct speci�c of the mean!� Consistent for � in either Fixed effects or random effects model.

� Robust inference is based on standard errors that do not require mean =variance.

40

Page 41: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL DATA: POISSON FIXED EFFECTS

� Additionally assume the Poisson distribution and both � and �i are parame-ters to be estimated.

� Get Fixed effects MLE of � alone by concentrating out �i.Some math yields b�ML = b�GMM !

� Or do conditional MLE based on the conditional density f (yi1; :::; yitjyi). Thenb�CML = b�GMM !� Robust inference is based on standard errors that do not require mean =variance.

41

Page 42: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL DATA: POISSON RANDOM EFFECTS

� Assume the Poisson distribution and �i are i.i.d. gamma distributed withmean 1 and variance 1=�.

� Obtain MLE of � and � yields f.o.c. for � ofnXi=1

TXt=1

xit

�yit � �it

�yi + �=T��i + �=T

�= 0; where �it = exp(x0it�):

42

Page 43: 20. Count Data - faculty.econ.ucdavis.edu

4C. PANEL DATA: OTHER COUNT MODELS

� Fixed and random effects for negative binomial also exist.But ef�ciency gains may not be great.

� For Fixed effects models use preceding moment-based estimator with robuststandard errors.

� For random effects this estimator is also consistent.Or can assume �exible distributions. Even if no closed form solution fordensity can use simulation methods.

� For dynamic models i.e. lagged dependent variable as regressor, instead usethe quasi-difference

E [(yit � (�it�1=�it)yit�1) jyit�1; :::; yi1;xit;:::;xi1] = 0;analogous to working with (yit � yit�1) in the linear model.

43

Page 44: 20. Count Data - faculty.econ.ucdavis.edu

4D. SAMPLE SELECTION

� Suppose process for zeros differs from that for nonzeros.e.g. visit doctor or not differs from process for further visits.

� Generalize the two-part model (or hurdle model) to permit correlation inunobservables across the two parts, similar to generalized tobit. Not donemuch.

44

Page 45: 20. Count Data - faculty.econ.ucdavis.edu

4E. ENDOGENEITY

� Problem isE[(yi � exp(x0i�))jxi] 6= 0:

� Assume existence of instruments zi such thatE[(yi � exp(x0i�))jzi] = 0:

� Then if dim[zi] = dim[�] estimate � by solvingXn

i=1(yi � exp(x0i�))zi = 0:

� And if dim[zi] > dim[�] then use GMM.

45

Page 46: 20. Count Data - faculty.econ.ucdavis.edu

4F. SEMIPARAMETERIC

� Focus on estimating conditional mean.

� Most generally E[yijxi] = g(xi) and estimate function g(�):

� Kernel regression works well in one dimension.� In higher dimensions need more structure.e.g. the single-index form E[yijxi] = g(x0i�):

� Flexible parametric may be an alternative method.e.g. series expansions.

46

Page 47: 20. Count Data - faculty.econ.ucdavis.edu

4G. BAYESIAN

� Poisson with gamma prior yields closed form solution.

� But can now use richer models, e.g. negative binomial and normal prior, andcompute using MCMC methods.

47

Page 48: 20. Count Data - faculty.econ.ucdavis.edu

5. SUMMARY OF COUNT REGRESSION

� For cross-section count data basic approaches are� Moment-based: Let E[yjx] = exp(x0�) and do Poisson QMLE with robusts.e.'s.

� Fully parametric: MLE of richer models than Poisson.

� For panel count data� Specify multiplicative individual speci�c effect.� Moment-based: Estimation based on quasi-differenceE��yit � (�it=��i)�yi

�jxi1;:::;xiT

�= 0 with robust s.e.'s.

� Fully parametric: MLE of richer models than Poisson-gamma.� Use E [(yit � (�it�1=�it)yit�1) jyit�1; :::; yi1;xit;:::;xi1] = 0 if model isdynamic.

48

Page 49: 20. Count Data - faculty.econ.ucdavis.edu

5. SUMMARY OF COUNT REGRESSION (continued)

� The cross-section and static panel count models can be estimated in STATA,LIMDEP and TSP.

� Count methods also exist (though no off-the-shelf programs) for the usualcomplications� Time Series data� Multivariate data� Measurement error� Sample selection� Endogenous regressors� Semiparametric approach� Bayesian approach.

49

Page 50: 20. Count Data - faculty.econ.ucdavis.edu

6. REFERENCES [Recent Examples plus some classics]1. BooksCameron, A.C., and P.K. Trivedi (1998), Regression Analysis of CountData,Econometric Society Monograph No.30, Cambridge University Press.Winkelmann, R. (2000), Econometric Analysis of Count Data, 3rd edition,Springer.2 and 3A. Cross-Section Poisson and Negative BinomialCameron, A.C., and P.K. Trivedi (1986), �Econometric Models Based on CountData: Comparisons and Applications of Some Estimators,� Journal of AppliedEconometrics, 1, 29-53.3E. Hurdle Model or Two-Part Model and With-ZeroesMullahy, J. (1986), �Speci�cation and Testing of Some Modi�ed Count DataModels,� Journal of Econometrics, 33, 341-365.3F. Finite Mixture ModelsDeb, P. and P.K.Trivedi (1997), �Demand for Medical Care by the Elderly: AFinite Mixture Approach,� Journal of Applied Econometrics, 12(3), 313-36.

50

Page 51: 20. Count Data - faculty.econ.ucdavis.edu

3G. Latent Class ModelsDeb, P. and P.K.Trivedi (2001), �The Structure of Demand for Health Care: LatentClass versus Two-part Models,� Journal of Health Economics, forthcoming.4A. Time Series DataBrannas, K. and J. Hellstrom (2001), �Generalized Integer-Valued Autoregres-sion,� Econometric Reviews, 20(4), 425-43.4B. Multivariate DataTrivedi, P.K. and Munkin, M.K. (1999), �Simulated Maximum LikelihoodEstimation of Multivariate Mixed-Poisson Regression Models, with Application�,Econometrics Journal, 2(1), 29-48.4C. Panel DataHausman, J.A., B.H. Hall and Z. Griliches (1984), �Econometric Modelsfor Count Data With an Application to the Patents-R and D Relationship,�Econometrica, 52, 909-938.Blundell, R., R. Grif�th and F. Windmeijer (2002), �Individual Effects andDynamics in Count Data,� Journal of Econometrics, 108, 113-131.

51

Page 52: 20. Count Data - faculty.econ.ucdavis.edu

Windmeijer, F. (2002), �EXPEND, A Gauss programme for non-linear GMMestimation of exponential models with endogenous regressors for cross sectionand panel (dynamic) count data models", cemmap working paper CWP14/02.4D. Sample SelectionWinkelmann, R. (1998), �Count Data Models with Selectivity,� EconometricReviews, 17(4), 339-59.4E. EndogeneityMullahy, J. (1997), �Instrumental Variable Estimation of Poisson RegressionModels: Application to Models of Cigarette Smoking Behavior,� Review ofEconomics and Statistics, 79, 586-593.Windmeijer, F. (2000), �Moment Conditions for Fixed Effects Count Data Modelswith Endogenous Regressors,� Economics Letters, 68(1), 21-24.4G. BayesianChib, S., E.Greenberg and R.Winkelmann (1998), �Posterior Simulation andBayes Factors in Panel Count Data,� Journal of Econometrics, 86(1), 33-54.

52