5 Panel Data - Southern Methodist...

32
5 Panel Data panel data include multiple draws on the same basic unit of observation (‘group’) typically, multiple draws over time, but panel data need not require a time dimension examples individuals, firms, or countries observed at multiple time periods multiple individuals within a household observed at a point in time multiple employees within a firm observed at a point in time can have 3 (or higher) dimensional panels (e.g., multiple individuals within a household observed at multiple points in time) 86

Transcript of 5 Panel Data - Southern Methodist...

5 Panel Data

• panel data include multiple draws on the same basic unit of observation

(‘group’)

• typically, multiple draws over time, but panel data need not require a

time dimension

• examples

– individuals, firms, or countries observed at multiple time periods

– multiple individuals within a household observed at a point in time

– multiple employees within a firm observed at a point in time

• can have 3 (or higher) dimensional panels (e.g., multiple individuals

within a household observed at multiple points in time)

86

• data structure

{yit, xit}i=1,...,N ;t=1,...,T

where total sample size = NT

– examples

∗ i indexes individuals, firms, or countries; t indexes time periods

∗ i indexes individuals; t indexes households

∗ i indexes employees; t indexes firms

• in microeconometric studies, typically N is large, T is small; macroe-

conometrics may be the reverse

• asymptotics can be performed on N −→∞, T −→∞, or both

87

5.1 Pooled OLS

• model

yit = α + xitβ + εit, εitiid∼ N(0, σ2)

• estimation via OLS

• identical to usual OLS, only now sample size is NT

• usual assumptions required for unbiasedness, consistency, etc.

• extensions

– time trend

∗ linear time trend

yit = α + λt + xitβ + εit, εitiid∼ N(0, σ2)

which allows the intercept to trend linearly over time, changing by

λ each period

∗ quadratic time trend

yit = α + λ1t + λ2t2 + xitβ + εit, εit

iid∼ N(0, σ2)

which allows the intercept to follow a more general time trend

– structural break

yit =

α1 + xitβ1 + εit if t 6 T

α2 + xitβ2 + εit if t > T

where T is the date of the structural break

88

∗ could have multiple breaks if panel is long enough

∗ Chow test

Ho : α1 = α2, β1 = β2

H1 : not all equal

has a test statistic of

FK+1,NT−2K−2 =(SSRR − SSR1 − SSR2)/(K + 1)

(SSR1 + SSR2)/(NT − 2K − 2)

where

· SSRR = SSR from pooled (restricted) model

· SSR1 = SSR from OLS using only obs with t 6 T

· SSR2 = SSR from OLS using only obs with t > T

· K = # of x’s

89

∗ alternative

· define

Iit =

1 if t > T

0 if t 6 T

· estimate via OLS

yit = α1 + α2Iit + xitβ1 + xitIitβ2 + εit, εitiid∼ N(0, σ2)

and test

Ho : α2, β2 = 0

H1 : not all = 0

90

– time-specific intercepts

yit = α +∑T

s=2λsDst + xitβ + εit

where

Dst =

1 if s = t

0 otherwise

∗ equivalent to

yit =∑T

s=1λsDst + xitβ + εit

where constant is no omitted to avoid perfect multicollinearity

∗ λ’s capture effects of all variables that do not vary across i at a

point in time

∗ any x’s that do not vary across individuals are subsumed by λ’s

even if they vary over time

∗ more general than time trend since intercepts can potentially bounce

all over

91

∗ Chow test

Ho : λ1 = · · · = λT

H1 : not all equal

has a test statistic of

FT−1,NT−K−T =(SSRR − SSRU)/(T − 1)

SSRU/(NT −K − T )

where

· SSRR = SSR from restricted model (single intercept)

· SSRU = SSR from unrestricted model (T intercepts)

· K = # of x’s

∗ can interact time dummies with x’s to allow β’s to vary over time

∗ inclusion of all time dummies, and all interactions between time

dummies and x’s equivalent to OLS period-by-period

92

– difference-in-difference estimation

∗ frequently used in policy analyses

∗ examples

· What was the impact of NJ’s minimum wage hike?

· What is the impact of legalized abortion on crime?

· What is the impact of the death penalty on crime?

∗ cross-sectional model

yi = α + xiβ + δDi + εi, εi ∼ N(0, σ2), i = 1, ..., N

where, say,

· y = unemployment rate

· x = macroeconomic variables

· D = 1 if NJ (high MW), 0 for all other states (low MW)

∗ potential shortcoming: what if there are unobservable differences

between observations with the policy, and those without the policy

· e.g., if NJ is different from other states for reasons not included

in x, then Cov(D, ε) 6= 0

· =⇒ δOLS (and perhaps βOLS) will be biased

∗ panel data offers a potential solution

∗ involves collecting data prior to policy implementation

93

∗ intuition

· cross-sectional model identifies δ by comparing the level of y

in states with the policy to the level of y in states without the

policy

· difference-in-difference model identifies δ by comparing the change

in y in states from before and after the policy to the change in

y in states with no policy change

∗ panel model

yit = α + xitβ + λ1Di + λ2D2t + δDiD2t + εit, εit ∼ N(0, σ2)

where, say,

· y = unemployment rate

· x = macroeconomic variables

· Di = 1 if NJ (‘policy changer’), 0 for all other states (no policy

change)

· D2t = 1 for periods in which NJ has a high MW, 0 for previous

time periods

· DiD2t = 1 if NJ after policy change, 0 for all other observations

94

∗ interpretation of parameters in the panel model (ignoring x’s)

pre-policy change post-policy change

t = 1 t = 2

no policy change D = 0 α α + λ2

‘policy changer’ D = 1 α + λ1 α + λ1 + λ2 + δ

which implies

· λ2 = difference (or change) over time in states without policy

change (α + λ2 − α = λ2)

· λ2 + δ = difference (or change) over time in states with policy

change (α + λ1 + λ2 + δ − (α + λ1) = λ2 + δ)

· δ = difference in the the two differences (λ2+δ−λ2 = δ), which

is the additional change in states with the policy change

· δPOLS known as DID estimator

95

∗ notes

· λ1 captures time-invariant differences in states with the policy

change vs. states with no policy change; solves the omitted

variable bias problem in cross-sectional models if the relevant

omitted vars do not change over time

· λ2 captures changes over time that affect all states – policy

changers and non-changers – equally

· δPOLS = unbiased estimate of policy impact if (i) λ1 captures

all differences between policy changers and non-changers, and

(ii) the change in y over time (equal to λ2) is idential for both

policy changers and non-changers

· if no x’s, then

δPOLS = (y12 − y1

1)− (y02 − y0

1)

where yDt = mean outcome in period t of states of type D

96

5.2 Fixed Effects

• motivation

– OLS is biased if omitted vars are correlated with included x’s

– not always possible to find valid IVs

– if omitted var does not vary over time (time invariant), panel data

can yield estimates free from omitted variable bias

• same setup as before, but allow for individual-specific intercepts

yit = αi + xitβ + εit, εitiid∼ N(0, σ2)

– αi = FE for group i (aka, unobserved effect, unobserved heterogene-

ity)

– also referred to as unobserved effects or unobserved heterogeneity

– εit = idiosyncratic error

• FEs subsume all time invariant x’s

• FEs capture all time invariant attributes – observable and unobservable

– of individual i

97

• pooled OLS equivalent to estimating

yit = α + xitβ + εit

εit = (αi − α) + εit

where εit is known as a composite error

– unbiasedness of βPOLS requires Cov(αi, xit) = 0 and Cov(εi, xit) = 0

– bias due to Cov(αi, xit) 6= 0 known as heterogeneity bias

• pulling out αi from the error term permits unbiased estimates of β even

if Cov(αi, xit) 6= 0

• if Cov(αi, xit) = 0, then random effects estimation is more efficient

• if αi = α ∀i, then pooled OLS is more efficient

• estimation methods when Cov(αi, xit) 6= 0

– LSDV (Least Squares Dummy Variable Model)

– FD (first-differencing)

– mean-differencing (FE estimator; within estimator)

• STATA: -xtreg, fe fd -, -areg-

98

• LSDV (Least Squares Dummy Variable Model)

yit =∑N

j=1αjDji + xitβ + εit

where

Dji =

1 if j = i

0 otherwise

• amounts to including N dummy vars, 1 for each group

• estimated by pooled OLS

• βLSDV is consistent even if Cov(αi, xit) 6= 0 (regressors can always be

correlated)

• only feasible computationally if N is of reasonable size

99

• FD (first-differencing)

yit = αi + xitβ + εit

– implies

yi1 = αi + xi1β + εi1

...

yiT = αi + xiTβ + εiT

– taking differences between consecutive years yields

yi2 − yi1 = (xi2 − xi1)β + (εi2 − εi1)

...

yiT − yiT−1 = (xiT − xiT−1)β + (εiT − εiT−1)

or, using new notation,

∆yi2 = ∆xi2β + ∆εi2

...

∆yiT = ∆xiTβ + ∆εiT

where ∆ represents the change from the preceding year

– the model to be estimated is

∆yit = ∆xitβ + ∆εit, i = 1, ..., N ; t = 2, ..., T

100

– notes

∗ FD the data, then regress ∆yit on ∆xit using N(T − 1) observa-

tions

∗ interpretation of βFD is same as original β

∗ differencing eliminates αi and any time invariant x’s

∗ consistency requires Cov(∆xit, ∆εit) = 0

· known as strict exogeneity

· requires Cov(xit, εit) = Cov(xit, εit−1) = Cov(xit, εit+1) = 0 ∀t∗ estimator of FEs

αi = yi − xiβ

which is unbiased, but consistency requires T −→∞

101

• mean-differencing (FE estimator; within estimator)

yit = αi + xitβ + εit

– implies

yi = αi + xiβ + εi

and

y = α + xβ + ε

where bars indicate average over T obs within group; double bars

indicate average over entire sample

– taking differences yields

yit − yi = (xit − xi)β + (εit − εi)

or, using new notation,

yit = xitβ + εit

102

– alternative representation

yit + y = α + (xit + x)β + (εit + ε)

or, using new notation,

˜yit = ˜xitβ + ˜εit

– notes

∗ demean the data, then regress yit on xit, or ˜y on ˜x, using NT

observations

∗ need to adjust degrees of freedom due to estimation of means

∗ interpretation of βFE is same as original β

∗ differencing eliminates αi and any time invariant x’s

∗ consistency requires Cov(xit, εit) = 0

· again, strict exogeneity

· requires xit to be independent of error term from every time

period

103

• comparisons

– LSDV and mean-differencing are identical

– T = 2 =⇒ all three are identical

– T > 3 =⇒ different, but both unbiased

• extensions

– time dummies (DID estimator)

104

5.3 Random Effects

• motivation

– if Cov(xit, αi) = 0, then estimating N parameters αi is inefficient

(equivalently, loosing N obs to FD or MD is inefficient)

– but, if αi 6= α ∀i, then pooled OLS yields incorrect std errors since

Cov(εit, εit′) 6= 0 (within-group serial correlation)

Cov(εit, εit′) =

σ2α + σ2

ε if t = t′

σ2α if t 6= t′

which implies positive serial correlation within groups

– solution

∗ leave αi as part of the composite error

∗ transform the data to a model with serially uncorrelated errors

∗ known as Generalized Least Squares (GLS) estimation in general,

RE estimation in this special case

• same setup as before

yit = α + xitβ + εit, εit ∼ N(0, σ2ε)

where εit is the composite error term

• assume αiiid∼ N(0, σ2

α) = RE for group i

• assume Cov(αi, εit) = 0 ∀t

105

• RE estimation

– transform the data to a model with serially uncorrelated errors

– RE covariance structure

Σi︸︷︷︸TxT

=

σ2α + σ2

ε

σ2α

. . .

... . . . . . .

σ2α · · · σ2

α σ2α + σ2

ε

and

Σ︸︷︷︸NTxNT

=

Σ1

0 . . .

... . . . . . .

0 · · · 0 ΣN

– define

λ = 1−√

σ2ε

Tσ2α + σ2

ε

∈ [0, 1]

– λ-difference the data

yit − λyi = (xit − λxi)β + (εit − λεi)

implies µit ≡ εit − λεiiid∼ N(0, σ2

µ)

106

• steps

– estimate the model using fixed effect methods or pooled OLS

– obtain an estimate, λ

– difference the data using λ

– regress yit − λyi on xit − λxi

• notes

– special cases

∗ λ = 0 =⇒ pooled OLS

∗ λ = 1 =⇒ FE estimation

– RE allows time invariant x’s

– consistency requires Cov(αi, xit) = Cov(εit, xit) = 0

• STATA: -xtreg, re-

107

5.4 Specification Tests

• Hausman test of FE vs. RE

– intuition

∗ if Cov(αi, xit) = 0, then RE and FE are both consistent, but RE

is more efficient

=⇒ βRE ≈ βFE

∗ if Cov(αi, xit) 6= 0, then RE is inconsistent, but FE is consistent

=⇒ βRE 6= βFE

– define test statistic based on difference βFE − βRE

H = T(βFE − βRE

)′ (ΣFE − ΣRE

)−1 (βFE − βRE

)∼ χ2

K

where K = # of x’s

– if test statistic is too large, then reject Cov(αi, xit) = 0

– STATA: -hausman-

108

• RE vs. pooled OLS

– hypothesis

Ho : σ2α = 0

H1 : σ2α 6= 0

– Breusch-Pagan (1980) test

λLM =NT

2(T − 1)

∑Ni=1

(T εi

)2

∑Ni=1

∑Tt=1 ε2

it

− 1

2

∼ χ21

– STATA: -xttest1 - after -xtreg, re-

109

• groupwise heteroskedasticity

– errors are homoskedastic within groups, heteroskedastic across groups

– e.g., errors for a given individual have same variance in each period,

but each individual has a unique variance

– structure:

Σi︸︷︷︸TxT

=

σ2i

0 . . .

... . . . . . .

0 · · · 0 σ2i

and

Σ︸︷︷︸NTxNT

=

Σ1

0 . . .

... . . . . . .

0 · · · 0 ΣN

– hypothesis

Ho : σ2i = σ2 ∀i

H1 : σ2i 6= σ2 for some i

– modified Wald test statistic

W ′ =∑N

i=1

(σ2

i − σ2)2

Vi∼ χ2

N

110

where

Vi =1

T − 1

∑T

t=1

(ε2it − σ2

i

)2

σ2i =

1

T

∑T

t=1ε2it

– notes

∗ valid in presence of non-normality

∗ lower power in ‘large N , small T ’ FE models

– STATA: -xttest3 - after -xtreg, fe-

111

• cross-sectional dependence

Ho : Cov(εit, εjt) = 0 ∀i 6= j

H1 : Cov(εit, εjt) 6= 0 for some i 6= j

– T > N

∗ Breusch-Pagan (1980) test

λLM = T∑N

i=2

∑i=1

j=1ρ2

ij ∼ χ2d

where

· d = N(N − 1)/2

· ρij = Corr(εi, εj), i 6= j; specifically,

ρij =

∑Tt=1 εitεjt√(∑T

t=1 ε2it

)√(∑Tt=1 ε2

jt

)

∗ intuition

· compute NxN correlation matrix

R =

ρ11 · · · · · · ρ1N

... . . .

... . . .

ρ1N · · · · · · ρNN

· no correlation =⇒ R = IN

112

∗ test does not have good statistical properties when T < N , and

likely to do worse as N −→∞∗ STATA: -xttest2 - after -xtreg, fe-

– T < N

∗ Peasaran (2004) test

λCD =

√2T

N(N − 1)

∑N−1

i=1

∑i=N

j=i+1ρij ∼ N(0, 1)

∗ STATA: -xtcsd, pes- after -xtreg, fe re-

113

5.5 Dynamic Panel Model

• model

yit = xitβ + γyit−1 + αi + εit, εitiid∼ N(0, σ2)

where i = 1, ..., N ; t = 2, ..., T ; and T > 3

• estimation

– even if Cov(αi, xit) = 0, RE not applicable since Cov(αi, yit−1) 6= 0

– need FE/FD estimator

– FD =⇒

∆yit = ∆xitβ + γ∆yit−1 + ∆εit, i = 1, ..., N ; t = 3, ..., T

– but this model is not estimable by OLS since Cov(∆εit, ∆yit−1) 6= 0

since Cov(εit−1, yit−1) 6= 0

114

• solutions

– FD, then estimate via IV, treating ∆yit−1 as endogenous

– what are potential IVs?

∗ need vars that are correlated with ∆yit−1, uncorrelated with ∆εit

∗ suitable candidates

· xit−2 (through yit−2)

· yit−2 (through yit−2)

· yit−3, yit−4, yit−5, ... (through autoregressive process) ... e.g.,

Cov(∆yit−1, yit−3) = Cov(yit−1, yit−3)− Cov(yit−2, yit−3)

6= 0

∗ lots of instruments (beware of weak IVs)

– simple solution: FD, then use TSLS with xit−2, yit−2 as IVs

115

– more complex solution

∗ estimation by GMM to utilize more instruments

· writing out model for each period yields

∆yi3 = ∆xi3β + γ∆yi2 + ∆εi3

...

∆yiT = ∆xiTβ + γ∆yiT−1 + ∆εiT

where IVs for ∆yi2 are xi1, yi1; IVs for ∆yi3 are xi2, yi2, yi1; ...

; IVs for ∆yiT−1 are xiT−2, yiT−2, ..., yi2, yi1

· not usual TSLS set-up

· GMM allows moment conditions to be derived using as many

IVs as desired

· requires εit to be serially uncorrelated; or, equivalently, ∆εit

should be AR(1)

∗ STATA: -xtabond - (Arellano & Bond 1991)

116

• persistence

– Blundell & Bond (1998) show that if |γ| > 0.8 or so, TSLS and A-B

estimator do not work very well (weak IVs)

– solution

∗ add additional moment conditions derived from the model in levels

yit = xitβ + γyit−1 + αi + εit

∗ what are IVs for yit−1?

· ∆yit−1 (independent of αi, εit)

· ∆xit−1 (independent of αi, εit)

· ∆yit−2, ∆yit−3, ... (through autoregressive process)

– STATA: -xtabond2 - (system estimator)

117