Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway...

29
Microeconometrics Count Data Models: Example Derya Uysal Department of Economics University of Munich Summer 2017

Transcript of Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway...

Page 1: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Microeconometrics

Count Data Models: Example

Derya Uysal

Department of Economics

University of Munich

Summer 2017

Page 2: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Rainer Winkelmann (2004): Health care reform and the number of doctor

visits: An econometric analysis. Journal of Applied Econometrics, 19, 455–472.

Microeconometrics SoSe 2017, 1

Page 3: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

What “big picture” issues does the paper address?

What is the effect of price (here, health insurance copayments) on the

demand for health services?

• The number of doctor visits (in a certain period) is often used as a

measure of the demand for health services.

• The number of doctor visits is a count variable with many zeros.

• This paper compares various models for count variables with excess zeros.

It addresses both a methodological issue and answers a substantive

research question.

Microeconometrics SoSe 2017, 2

Page 4: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Which models does the author compare?

• Regression models for count data

• Poisson model

• Structural models:

“hurdle” version

“finite mixture” version

• An alternative hurdle model

• All models use a linear index assumption for the effect of the covariates

(equation on page 458).

Microeconometrics SoSe 2017, 3

Page 5: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Institutional background, reform, and data sources

Possible explanations for rising health care costs

• Preferences (health as a superior or luxury good) when income rises

• Population aging, increased life expectancy

• Technological progress brings more expensive treatments

• Incentive problems (moral hazard)

• This paper studies a reform specifically targeted at incentive problems

Microeconometrics SoSe 2017, 4

Page 6: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Institutional background, reform, and data sources

The 1997 health care reform in Germany

• This was only one reform of many in the past 25 years.

• The main element was an increase in the copayments (Zuzahlungen) for

prescription drugs and other services:

• Relative increase depends on the package size:

small: 3 DM → 9 DM

medium: 5 DM → 11 DM

large: 7 DM → 13 DM

• There ware some exceptions for hardship cases.

• The reform became effective July 1st, 1997.

Microeconometrics SoSe 2017, 5

Page 7: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Institutional background, reform, and data sources

Data

• The data come from the German Socio-Economic Panel (GSOEP, now

just called SOEP)

• Panel data covering 5 years, 1995–1999 (two before, two after the reform

year)

Earlier research

• No reliable analysis of the effects of this reform (and similar ones) existed

before Winkelmann’s paper.

• The study by Lauterbach et al. (2000) he cites is poorly designed.

Microeconometrics SoSe 2017, 6

Page 8: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

The Poisson model and its shortcomings

• The starting point of the analysis is the Poisson regression model for count

data – the same model we discussed in the lecture:

Pr [yi |λi ] =e−λiλyi

i

yi !,

using the standard linear index parameterization, λi = exp(x ′i β).

• In the interpretation of the results, Winkelmann focuses on the relative

change in expected doctor visits before and after the reform:

∆%(98,96) =

[E [yi,98| xi,98]

E [yi,96| xi,96]− 1

]× 100 = [exp(β96 − β98)− 1]× 100

Microeconometrics SoSe 2017, 7

Page 9: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

The Poisson model and its shortcomings

• Winkelmann discusses three shortcomings (page 460):

1. The Poisson regression model does not allow for unobserved heterogeneity.

2. The Poisson regression model ignores the panel structure of the data (fiveobservations per individual). If there is unobserved individual heterogeneity,the sample is not i.i.d. any more. Thus,

• either standard errors have to be adjusted,

• or panel data versions of count data models must be used (fixed or random

effects).

3. The (single) linear index structure might be too restrictive.

Microeconometrics SoSe 2017, 8

Page 10: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Alternatives to the Poisson regression model

Structural models (1): The hurdle model

• The hurdle model, due to Mullahy (1986), combines a binary model for

the health care use decision with a count data model that is truncated

from the left at 1.

• The binary variable d indicates whether a person has not seen a doctor:

di = 1−min(1, yi )

• The density of an observation is then given by

f (yi ) = f di1i [(1− f1i )fT (yi |yi > 0)]1−di ,

where f1i = Pr [di = 1] and fT (yi |yi > 0) = f2(yi )1−f2(0)

.

Microeconometrics SoSe 2017, 9

Page 11: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Alternatives to the Poisson regression model

• As before, the functions f1 and f2 must be chosen parametrically.

• Winkelmann lists various possibilities on pp. 460–1.

• Note in particular that f1 might be standard normal, as in the Probit

model, but it need not be.

• Choices for f2 include the Poisson and the negative binomial.

• The function f1 can be chosen so that the hurdle model nests the

non-hurdle model (say, the standard Poisson model) as a special case, so

that the validity of the latter can be test.

• The model has a “structural” interpretation as it reflects a two-step

decision process.

Microeconometrics SoSe 2017, 10

Page 12: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Alternatives to the Poisson regression model

Structural models (2): The finite mixture model

• An alternative to the hurdle model, suggested by Deb and Trivedi (2002),

is a finite mixture model.

• This approach assumes that all individuals’ doctor visits can be described

by a count process, but that the intensity of this process varies across

individuals (say, healthy and sick people).

• This finite mixture model has the general structure we will discuss in

chapter 8:

f (yi |θ) =s∑

j=1

πj fj(yi |θj)

• The function fj is a parametric modelling choice. The number of classes,

here denoted by s, needs to be determined (say, s = 2).

Microeconometrics SoSe 2017, 11

Page 13: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Alternatives to the Poisson regression model

Yet another model: The Poisson log-normal hurdle model

• Winkelmann proposes another model (page 462):

“[. . . ] it is frequently found that the Poisson-log-normal model (which can

be derived assuming a Poisson model with unobserved heterogeneity in the

linear predictor that has a normal distribution, whereas the negative

binomial model assumes a log gamma distribution) provides a better fit,

although the computation of probabilities requires numerical quadrature.”

• His hurdle model consists of these parts:

• A Probit model for the hurdle

• A truncated Poisson log-normal model for strictly positive outcomes

Microeconometrics SoSe 2017, 12

Page 14: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Alternatives to the Poisson regression model

• This model allows us to combine various building blocks and ideas we

discussed throughout the semester!

• For the hurdle model, assume

Yi = 0 iff zi = x ′i γ + εi ≥ 0 .

• For the positive part of the distribution, assume

Yi |Yi > 0 ∼ truncated Poisson(λi )

where λi = exp(x ′i β + ui ).

• Finally, assume that εi and ui are independently normal with variances 1

and σ2.

Microeconometrics SoSe 2017, 13

Page 15: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Alternatives to the Poisson regression model

• The likelihood contributions are straightforward (leaving aside the fact

that the truncated Poisson looks nasty), see equation (7) on page 462.

• This model can be generalized to allow for correlation of the unobservables

εi and ui .

• The model is identified even if errors are correlated, but empirically it

might be hard to estimate the correlation parameter.

• As there is also an interpretation issue, Winkelmann prefers to report only

results for the model that assumes independence.

Microeconometrics SoSe 2017, 14

Page 16: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Results

• The average number of doctor visits per quarter declined from 2.66 to 2.35

between 1996 and 1998.

• Fraction of non-users increases 4.4 percentage point between 1996 and 1998.

• The unemployment to population ratio captures the state of the business cycle.

• A general improvement in the health status of the population between 1996 and

1998.

Microeconometrics SoSe 2017, 15

Page 17: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Results

• The average number of doctor visits per quarter declined from 2.66 to 2.35

between 1996 and 1998.

• Fraction of non-users increases 4.4 percentage point between 1996 and 1998.

• The unemployment to population ratio captures the state of the business cycle.

• A general improvement in the health status of the population between 1996 and

1998.

Microeconometrics SoSe 2017, 15

Page 18: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Results

• Table II contains the results of a standard Poisson regression and of RE

and FE panel versions.

• Table III contains statistics that allow to compare alternative models.

• Table IV contains the results of the Poisson log-normal model with a

Probit hurdle.

• Table V, finally, reports the main measure of the reform effect for all

models.

Microeconometrics SoSe 2017, 16

Page 19: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Poisson regression results

• On average, men have fewer

doctor visits (as always, ceteris

paribus).

• Age has a U-shaped effect on the

number of doctor visits.To see this how the age effect looks,you can do the following in Stata:

twoway function y = -0.1057*x/10

+ 0.1580*x^2/1000 , range(15 80)

• Variables that are related to

health have the strongest effects.

• The coefficients of the year

dummies confirm that the reform

had a negative effect on the

number of doctor visits.

Microeconometrics SoSe 2017, 17

Page 20: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Poisson regression results

• On average, men have fewer

doctor visits (as always, ceteris

paribus).

• Age has a U-shaped effect on the

number of doctor visits.To see this how the age effect looks,you can do the following in Stata:

twoway function y = -0.1057*x/10

+ 0.1580*x^2/1000 , range(15 80)

• Variables that are related to

health have the strongest effects.

• The coefficients of the year

dummies confirm that the reform

had a negative effect on the

number of doctor visits.

Microeconometrics SoSe 2017, 17

Page 21: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Model comparison

• According to the log likelihood and the Schwartz information criterion

(which takes into account the number of parameters), the new model with

a Probit hurdle and log-normal unobserved heterogeneity provides a

substantial improvement (Table II).

Microeconometrics SoSe 2017, 18

Page 22: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Model comparison

• According to the log likelihood and the Schwartz information criterion

(which takes into account the number of parameters), the new model with

a Probit hurdle and log-normal unobserved heterogeneity provides a

substantial improvement (Table II).

Microeconometrics SoSe 2017, 18

Page 23: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Model comparison

• Substantively, the alternative models yields similar results as the Poisson

(Table V).

• However, they provide additional insights on the heterogeneity of the

reform effects.

• In particular, the reform appears to have had stronger effects for people

with fewer doctor visits or better health.

Microeconometrics SoSe 2017, 19

Page 24: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Model comparison

Microeconometrics SoSe 2017, 20

Page 25: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Results

Effects of the reform

• The estimates of the Poisson regression models, with or without

unobserved heterogeneity, are all in the same range, varying between 9.9

and 10.4%.

• As an interesting side note: Lauterbach, who sampled patients in

pharmacies, obtained smaller estimates. Why?

“The analysis of this paper suggests, however, a more fundamental reason,

namely the fact that the Cologne study sampled individuals on-site and

thus overrepresented heavy users. If it is the case that heavy users have a

lower demand elasticity than occasional users, the two findings can be

reconciled.” (page 469)

Microeconometrics SoSe 2017, 21

Page 26: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Results

Effects of the reform

• The estimates of the Poisson regression models, with or without

unobserved heterogeneity, are all in the same range, varying between 9.9

and 10.4%.

• As an interesting side note: Lauterbach, who sampled patients in

pharmacies, obtained smaller estimates. Why?

“The analysis of this paper suggests, however, a more fundamental reason,

namely the fact that the Cologne study sampled individuals on-site and

thus overrepresented heavy users. If it is the case that heavy users have a

lower demand elasticity than occasional users, the two findings can be

reconciled.” (page 469)

Microeconometrics SoSe 2017, 21

Page 27: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Results

• “The reduction is greatest at the left margin of the distribution: the

probability of being a user (at least one visit) decreased by an estimated

6.7% between 1996 and 1998, whereas the expected number of visits,

conditional on use, decreased only by an estimated 2.6%.” (page 469)

• This result is confirmed by the other two structural models: “The finite

mixture model separates the population into two groups. Two-thirds of

the population belong to a low-user group with a mean number of

quarterly visits of 1.6, and one-third belongs to a high-user group with a

mean number of 3 visits per quarter. Consistent with the above argument,

the low-user group shows a larger response to the reform, with a 13%

reduction.” (page 469)

Microeconometrics SoSe 2017, 22

Page 28: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Summary

• The main substantive finding is (page 470) that “[. . . ] the German health

care reform of 1997 affected the left tail much more than the positive part

of the distribution of the number of doctor visits. To the extent that the

positive part represents the subpopulation of the seriously or chronically ill,

whereas the left end of the distribution represents the healthy, this might

have been an intended consequence of the reforms.”

• The effects derived from these before-after regressions can be interpreted

as causal if “other things did not change concurrently, beyond the

individual socioeconomic characteristics controlled for in the regression”

(page 470). Winkelmann argues that this is the case.

Microeconometrics SoSe 2017, 23

Page 29: Microeconometrics 1ex Count Data Models: Example · you can do the following in Stata: twoway function y = -0.1057*x/10 + 0.1580*x^2/1000 , range(15 80) Variables that are related

Summary

• With respect to the econometric modelling of count data such as the

number of doctor visits, Winkelmann concludes (page 470):

“When studying the effects of reforms on the demand for doctor visits,

hurdle or two-part models should be given serious consideration. Of

course, these models are only one possible way to generalize the rigid

assumptions underlying single index models such as the standard Poisson

or negative binomial regressions.”

Microeconometrics SoSe 2017, 24