Pooled Cross-Section Time Series Data

97
Pooled Cross- Pooled Cross- Section Time Section Time Series Data Series Data Wooldridge Chapters 13 and Wooldridge Chapters 13 and 14 14

Transcript of Pooled Cross-Section Time Series Data

Page 1: Pooled Cross-Section Time Series Data

Pooled Cross-Section Pooled Cross-Section Time Series DataTime Series Data

Wooldridge Chapters 13 and 14Wooldridge Chapters 13 and 14

Page 2: Pooled Cross-Section Time Series Data

22

Types of DataTypes of Data Pooled Cross SectionsPooled Cross Sections: Independent cross : Independent cross

section data at different points in time. section data at different points in time.

Panel / LongitudinalPanel / Longitudinal: Uniquely identified : Uniquely identified cross section units (cross section units (ii) followed over time.) followed over time.• Balanced Panel: All Balanced Panel: All ii appear in every period. appear in every period.• Unbalanced Panel: Some Unbalanced Panel: Some i i are missing for are missing for

some time periods.some time periods.

Page 3: Pooled Cross-Section Time Series Data

33

Example: Two Period Panel DataExample: Two Period Panel DataN=4, T=2N=4, T=2

ii tt Consumption (Y)Consumption (Y) Income Income (X)(X)

11 11 7272 989811 22 7575 10210222 11 3131 404022 22 2626 393933 11 5555 666633 22 6262 707044 11 4141 595944 22 4545 6060

Page 4: Pooled Cross-Section Time Series Data

44

YYitit = B = B00 + B + B11XXitit + e + eitit

BB11 = 0.72, but how to interpret? = 0.72, but how to interpret?

2040

6080

40 60 80 100x

Fitted values y

Page 5: Pooled Cross-Section Time Series Data

55

Interpreting CoefficientsInterpreting Coefficients YYitit = B = B00 + B + B11XXitit + e + eitit

jtit

jtit

it

it

XXYY

XYB

1Change in Yi across individuals at time t.

itti

itti

it

it

XXYY

XYB

1,

1,1

Change in Yt over time for a given individual.

Page 6: Pooled Cross-Section Time Series Data

66

Use intercept dummies to Use intercept dummies to differentiate between “time” and differentiate between “time” and

“type” effects “type” effects

Time DummiesTime Dummies: the effect of being : the effect of being in time in time period 2period 2 vs. time vs. time period 1period 1 on on the expected value of the expected value of YYitit, holding all , holding all else constant.else constant.

Type DummiesType Dummies: : the effect of being the effect of being of of type Btype B vs. vs. type Atype A on the expected on the expected value of value of YYitit, holding all else constant., holding all else constant.

Page 7: Pooled Cross-Section Time Series Data

77

Time DummiesTime Dummies

Let Let DD2,t2,t = 0 = 0 if if t = 1t = 1 1 if 1 if t = 2t = 2

YYitit = B = B00 + + TTDD2,t2,t + e + eitit

)( 12 YYT Where is the mean at time 2 across all i.

2Y

Page 8: Pooled Cross-Section Time Series Data

88

Example: Two Period Panel Data Example: Two Period Panel Data with Time Dummywith Time Dummy

ii tt DDTT (Y(Yitit)) (X(Xitit))11 11 00 7272 989811 22 11 7575 10210222 11 00 3131 404022 22 11 2626 393933 11 00 5555 666633 22 11 6262 707044 11 00 4141 595944 22 11 4545 6060

Page 9: Pooled Cross-Section Time Series Data

99

Time Dummy ExampleTime Dummy Example

sum y if t==1sum y if t==1

Variable | Obs Mean Variable | Obs Mean -------------+---------------------------------------+-------------------------- y | 4 y | 4 49.7549.75

sum y if t==2sum y if t==2

Variable | Obs Mean Variable | Obs Mean -------------+---------------------------------------+-------------------------- y | 4 y | 4 52 52

Reg y dtReg y dtCoeff = 52 - 49.75 = 2.25Coeff = 52 - 49.75 = 2.25

Page 10: Pooled Cross-Section Time Series Data

1010

2040

6080

40 60 80 100x

Fitted values y

Time Dummy represents shift of regression line from period Time Dummy represents shift of regression line from period 1 to period 2. When regressed on Y1 to period 2. When regressed on Y itit along with X along with Xitit::

t=2

t=1

T = 2.25

Page 11: Pooled Cross-Section Time Series Data

1111

Type DummiesType Dummies Separate cross-sectional dimension of Separate cross-sectional dimension of

sample into qualitative “types” (e.g. male vs. sample into qualitative “types” (e.g. male vs. female, rural vs. urban, foreign vs. domestic, female, rural vs. urban, foreign vs. domestic, treatment vs. controltreatment vs. control, etc.), etc.)

Let DLet DiBiB = 1 if individual = 1 if individual i i is Type Bis Type B = 0 otherwise.= 0 otherwise.

YYitit = B = B00 + + BBDDiBiB + e + eitit )( ABB YY

When Xit is included in regression, B represents shift in intercept.

Page 12: Pooled Cross-Section Time Series Data

1212

Example: Two Period Panel Data Example: Two Period Panel Data with Type Dummywith Type Dummy

ii tt TypeType DDBB (Y(Yitit)) (X(Xitit))11 11 AA 00 7272 989811 22 AA 00 7575 10210222 11 BB 11 3131 404022 22 BB 11 2626 393933 11 BB 11 5555 666633 22 BB 11 6262 707044 11 AA 00 4141 595944 22 AA 00 4545 6060

Page 13: Pooled Cross-Section Time Series Data

1313

From Simple ExampleFrom Simple Examplereg y dbreg y db y | Coef. Std. Err. ty | Coef. Std. Err. t db | -14.75 12.51582 -1.18 db | -14.75 12.51582 -1.18 _cons | 58.25 8.850024 6.58 _cons | 58.25 8.850024 6.58

sum y if db==1sum y if db==1Variable | Obs Mean Variable | Obs Mean y | 4 43.5 y | 4 43.5

sum y if db==0sum y if db==0Variable | Obs MeanVariable | Obs Mean y | 4 58.25 y | 4 58.25

Coefficient = difference in meansCoefficient = difference in means= 43.5 - 58.25 = -14.75= 43.5 - 58.25 = -14.75

Page 14: Pooled Cross-Section Time Series Data

1414

Type Dummy represents shift of regression line from type B Type Dummy represents shift of regression line from type B to Type A. When regressed on Yto Type A. When regressed on Y itit along with X along with Xitit::

2040

6080

40 60 80 100x

Fitted values y

type=A

type=B

B = -14.25

Page 15: Pooled Cross-Section Time Series Data

1515

Difference-in-Differences EstimatorDifference-in-Differences Estimator Estimates the difference across types, and Estimates the difference across types, and

over time, using simple dummy variable over time, using simple dummy variable framework.framework.

Excellent for policy analysis. Takes advantage Excellent for policy analysis. Takes advantage of “natural experiment” quality of panel data.of “natural experiment” quality of panel data.

Can be expanded beyond two period Can be expanded beyond two period framework.framework.

Examples: stadium construction, natural Examples: stadium construction, natural disaster, water treatment facility, tax cuts.disaster, water treatment facility, tax cuts.

Page 16: Pooled Cross-Section Time Series Data

1616

Use interaction term between type Use interaction term between type and time dummies.and time dummies.

)()( 1,1,2,2,

,,2,1,2010

ABABDD

ititBtDDitBtitit

YYYY

eDDDDXBBY

Difference “After”

Difference “Before”

Page 17: Pooled Cross-Section Time Series Data

1717

Difference CoefficientDifference Coefficient Also known as “Average Treatment Also known as “Average Treatment

Effect”,Effect”,

Can also be written asCan also be written as

)()( 1,2,1,2, AABBDD YYYY

Treatment Impact on ‘treated’

Treatment Impact on control group.

Page 18: Pooled Cross-Section Time Series Data

1818

D-in-D exampleD-in-D example

ii tt TypeType DDBB D2D2TT DDBB*D2*D2TT (Y(Yitit)) (X(Xitit))11 11 AA 00 00 00 7272 989811 22 AA 00 11 00 7575 10210222 11 BB 11 00 00 3131 404022 22 BB 11 11 11 2626 393933 11 BB 11 00 00 5555 666633 22 BB 11 11 11 6262 707044 11 AA 00 00 00 4141 595944 22 AA 00 11 00 4545 6060

Page 19: Pooled Cross-Section Time Series Data

1919

From simple exampleFrom simple exampleReg y db d2 dd Reg y db d2 dd y | Coef. Std. Err. t y | Coef. Std. Err. t ---------------------------------------------------------------------------------------- db | -13.5 21.6015 -0.62db | -13.5 21.6015 -0.62 d2 | 3.5 21.6015 0.16d2 | 3.5 21.6015 0.16 dd | dd | -2.5-2.5 30.54914 -0.08 30.54914 -0.08_cons | 56.5 15.27457 3.70 _cons | 56.5 15.27457 3.70

Mean of y for type b when t=2: Mean of y for type b when t=2: 44.0044.00Mean of y for type a when t=2: Mean of y for type a when t=2: 60.0060.00 Mean of y for type b when t=1: Mean of y for type b when t=1: 43.0043.00Mean of y for type a when t=1: Mean of y for type a when t=1: 56.5056.50

Coefficient = (44.00 - 60.00) - (43.00- 56.50) Coefficient = (44.00 - 60.00) - (43.00- 56.50)

= (-16)-(-13.5) = = (-16)-(-13.5) = -2.5-2.5

Page 20: Pooled Cross-Section Time Series Data

2020

t=1 t=2

1,BY

)(

)(

1,2,

1,2,

AA

BBDD

YY

YY

1,AY

2,AY

2,BY

How much more did treatment group (B) outcome increase than control group (A) from time 1 to time 2?

Page 21: Pooled Cross-Section Time Series Data

2121

Panel Data Problem!Panel Data Problem!Unobserved HeterogeneityUnobserved Heterogeneity

There exist characteristics of There exist characteristics of each each individual that persistindividual that persist over timeover time which cannot be included in the which cannot be included in the regression (unobservable in available regression (unobservable in available data), but which none-the-less impact data), but which none-the-less impact the observed variation in our the observed variation in our dependent variable.dependent variable.

Page 22: Pooled Cross-Section Time Series Data

2222

Composite ErrorsComposite Errors

These These time-invarianttime-invariant unobserved unobserved effects are best modeled as a effects are best modeled as a component in the regression error term.component in the regression error term.

It is this It is this ““composite errorcomposite error”” approach approach that sets apart panel regression from that sets apart panel regression from OLS. OLS.

Page 23: Pooled Cross-Section Time Series Data

2323

ExamplesExamples Unobservable Unobservable motivational skillsmotivational skills of of

firm manager in a production function.firm manager in a production function.

Skills, charisma, connections, Skills, charisma, connections, nepotismnepotism in a wage model. in a wage model.

Levels of unobserved macro-level Levels of unobserved macro-level institutional institutional corruption or inefficiencycorruption or inefficiency in a cross-sectional growth model.in a cross-sectional growth model.

Page 24: Pooled Cross-Section Time Series Data

2424

The Composite Error ModelThe Composite Error Model

YYitit = B = B00 + + XXitit + v + vitit

WhereWhere vvitit = u = uitit + a + aii is the composite is the composite error, and… error, and…

uuitit is the random, time-varying is the random, time-varying idiosyncratic error.idiosyncratic error.

aaii is the is the time invariant errortime invariant error component.component.

Page 25: Pooled Cross-Section Time Series Data

2525

The Composite Error ProblemsThe Composite Error Problems

1.)1.) If If COV(COV(aaii, X, Xitit) ) 0 0, then OLS estimates , then OLS estimates will be biased.will be biased.

Very much like simultaneous equations Very much like simultaneous equations (endogeneity) bias, but here covariance with (endogeneity) bias, but here covariance with error term will only involve error term will only involve cross sectional cross sectional variation.variation.

Page 26: Pooled Cross-Section Time Series Data

2626

Composite Error BiasComposite Error Bias

221

221

21

10

)(

),(

)(

),()ˆ(

)(

)(

)(

)()ˆ(

)(

)()(ˆ

XX

uXCOV

XX

aXCOVBE

XX

uXXE

XX

aXXEBE

XX

uaXXB

uaXBBY

it

itit

it

iit

it

itit

it

iit

it

itiit

itiitit

Page 27: Pooled Cross-Section Time Series Data

2727

Examples:Examples: 1. manager charisma correlated with 1. manager charisma correlated with firm size in production function.firm size in production function. 2. Nepotism/networking correlated 2. Nepotism/networking correlated with education in wage equation. with education in wage equation.

3. Institutional quality associated 3. Institutional quality associated with development in corruption with development in corruption equation. equation.

Page 28: Pooled Cross-Section Time Series Data

2828

2.) Since 2.) Since aaii represents a time-invariant represents a time-invariant component of the error, composite errors component of the error, composite errors will be correlated over time – will be correlated over time –

Serial CorrelationSerial Correlation is the result: is the result:Corr(Corr(vvitit, v, vi,t+si,t+s) ) 0 0

Estimates will not be biased, but Estimates will not be biased, but goodness of fit and significance of goodness of fit and significance of coefficients will be overstated.coefficients will be overstated.

Page 29: Pooled Cross-Section Time Series Data

2929

How to deal with the Composite How to deal with the Composite Error problem? Error problem?

Pooled OLSPooled OLS – do nothing about it. – do nothing about it.

First DifferenceFirst Difference – eliminate – eliminate aaii..

Dummy VariablesDummy Variables – estimate the – estimate the aaii when N when N smallsmall

Fixed EffectsFixed Effects. – estimate . – estimate aaii when when NN large. large.

Random EffectsRandom Effects. – account for serial correlation. – account for serial correlation

Page 30: Pooled Cross-Section Time Series Data

3030

First Difference Transformation (two First Difference Transformation (two period panel) with Time dummyperiod panel) with Time dummy

YYitit = B = B00 + + 00DDTtTt + + XXitit + a + aii + u + uitit For Period 2: For Period 2:

YYi2i2 = (B = (B00 + + 00) + ) + XXi2i2 + a + aii + u + ui2i2

For Period 1: For Period 1: YYi1i1 = B = B00 + + XXi1i1 + a + aii + u + ui1i1

First Difference = First Difference = YYii = = YYi2i2 – Y – Yi1i1

YYii = = 00 + B + B11(X(Xi2i2 – X – Xi1i1) + (u) + (ui2i2 – u – ui1i1))

YYii = = 00 + B + B11((XXii) + ) + uuii

Page 31: Pooled Cross-Section Time Series Data

3131

First DifferenceFirst Difference Transformation eliminates Transformation eliminates aaii terms.terms.

Corrects for heterogeneity bias and serial Corrects for heterogeneity bias and serial correlation.correlation.

Problems:Problems:• 1. 1. Eliminates all time invariant variablesEliminates all time invariant variables (type (type

dummies)dummies)

• 2. 2. Eliminates time dimensionEliminates time dimension in two period in two period panel (reduces panel (reduces TT by 1 in general) by 1 in general)

Page 32: Pooled Cross-Section Time Series Data

3232

““Type” Dummy Variables for each Type” Dummy Variables for each ii If If aaii terms are viewed as coefficients to be terms are viewed as coefficients to be

estimated, a dummy can be constructed estimated, a dummy can be constructed that uniquely identifies each individual in that uniquely identifies each individual in the sample.the sample.

Dummy coefficient will represent effect of Dummy coefficient will represent effect of the sum of all unobserved attributes.the sum of all unobserved attributes.

Page 33: Pooled Cross-Section Time Series Data

3333

Type DummiesType Dummies Solves ‘time invariant bias’ problem Solves ‘time invariant bias’ problem

by removing by removing aaii from error from error component, and directly estimating component, and directly estimating the effects.the effects.

Obvious problem is that degrees of Obvious problem is that degrees of freedom are vastly reduced. freedom are vastly reduced. Requires a large number of time Requires a large number of time periods relative to cross sectional periods relative to cross sectional units.units.

Page 34: Pooled Cross-Section Time Series Data

3434

Example: 4 country panel over 250 Example: 4 country panel over 250 monthsmonths

Step 1: append the separate country data Step 1: append the separate country data files:files:

use c:/stata627/nfa/canada.dtause c:/stata627/nfa/canada.dta append using append using c:/stata627/nfa/italy.dtac:/stata627/nfa/italy.dta

append using append using c:/stata627/nfa/japan.dtac:/stata627/nfa/japan.dta

append using c:/stata627/nfa/uk.dtaappend using c:/stata627/nfa/uk.dta tsset code timetsset code time

Page 35: Pooled Cross-Section Time Series Data

3535

Dummy Example – estimates of Dummy Example – estimates of aaii xi:reg y cpi r er i.codexi:reg y cpi r er i.codei.code _Icode_1-5 (naturally coded; _Icode_1 omitted)i.code _Icode_1-5 (naturally coded; _Icode_1 omitted)

Number of obs = 990Number of obs = 990Prob > F = 0.0000Prob > F = 0.0000R-squared = 0.7464R-squared = 0.7464Adj R-squared = 0.7448Adj R-squared = 0.7448------------------------------------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| y | Coef. Std. Err. t P>|t| -------------+-----------------------------------------------------+---------------------------------------- cpi | .3817633 .0117365 32.53 0.000 cpi | .3817633 .0117365 32.53 0.000 r | -.4944136 .0780945 -6.33 0.000 r | -.4944136 .0780945 -6.33 0.000 er | -.0196729 .0014589 -13.49 0.000 er | -.0196729 .0014589 -13.49 0.000 _Icode_3_Icode_3 | | 26.41765 26.41765 2.053128 12.87 0.000 2.053128 12.87 0.000 _Icode_4_Icode_4 | | -12.51685-12.51685 .6298041 -19.87 0.000 .6298041 -19.87 0.000 _Icode_5_Icode_5 | | -1.729212-1.729212 .5753217 -3.01 0.003 .5753217 -3.01 0.003 _cons | 67.36739 1.632653 41.26 0.000 _cons | 67.36739 1.632653 41.26 0.000 -------------------------------------------------------------------------------------------------------------- Code 1 = Canada, omittedCode 1 = Canada, omitted Code 3 = Italy, positive estimate of aCode 3 = Italy, positive estimate of aii Code 4 = Japan, negative aCode 4 = Japan, negative aii Code 5 = UK, negative aCode 5 = UK, negative aii

Page 36: Pooled Cross-Section Time Series Data

3636

Fixed EffectsFixed Effects Assume CORR(Assume CORR(aaii, X, Xitit) ) 0, but 0, but CORR(uCORR(uitit, X, Xitit) = 0.) = 0.

An alternative to the first difference An alternative to the first difference transformation is the “transformation is the “Time De-Time De-meaningmeaning” transformation of the ” transformation of the fixed fixed effects modeleffects model..

Results in a model essentially identical to Results in a model essentially identical to the Dummy model, without having to the Dummy model, without having to estimate estimate N-1 N-1 dummy coefficients.dummy coefficients.

Page 37: Pooled Cross-Section Time Series Data

3737

Fixed Effects TransformationFixed Effects Transformation

)()()(

)3(

)()()()(

:period each time level thefrommean heSubtract t element. timeout the takings,individualbetween

variationshowsonly equation, between"" theis )2( (2)

1 over time individualeach for avg.

model original )1(

1

1

100

10

1

10

ititit

iitiitiit

iitiiiitiit

iiii

T

titi

itiitit

uxyuuxxyy

uuaaxxyy

uaxy

yT

y

uaxy

Page 38: Pooled Cross-Section Time Series Data

3838

unbiased. is ˆ

0) ,(

)()()( )3(

1

1

1

FE

itit

ititit

iitiitiit

uxCORR

uxyuuxxyy

Fixed Effects Regression is equivalent to running OLS on Equation 3:

This is also known as the “within” estimation equation, as it shows the variation within a group over time.

Page 39: Pooled Cross-Section Time Series Data

3939

Fixed Effects CoefficientsFixed Effects Coefficients Will have same “two-dimension” Will have same “two-dimension”

interpretation as pooled OLS.interpretation as pooled OLS.

Variation in transformed variables are Variation in transformed variables are same as in same as in YYitit and and XXitit..

it

it

it

it

XY

XY

B

1

Page 40: Pooled Cross-Section Time Series Data

4040

Fixed Effects Transformation With Fixed Effects Transformation With Time-Invariant Time-Invariant Dummy Independent VariableDummy Independent Variable

.eliminated are and Both :Problem

)()()(

)3(

)()()()()(

(2)

11 over time individualeach for avg.

ieach for invariant timeand ,)1,0( )1(

1

1

0100

010

1

010

ii

ititit

iitiitiit

iitiiiiiitiit

iiiii

T

tiiiti

it

itiititit

Da

uxyuuxxyy

uuaaDDxxyy

uaDxy

DTDT

DT

D

DuaDxy

Page 41: Pooled Cross-Section Time Series Data

4141

Example: Two Period Panel DataExample: Two Period Panel DataN=4, T=2N=4, T=2

ii tt ((YYitit))11 11 7272 73.573.5 -1.5-1.511 22 7575 73.573.5 1.51.522 11 3131 28.528.5 2.52.522 22 2626 28.528.5 -2.5-2.533 11 5555 58.558.5 -3.5-3.533 22 6262 58.558.5 3.53.544 11 4141 4343 -2-244 22 4545 4343 22

itiit YYY )(iY

Page 42: Pooled Cross-Section Time Series Data

4242

Goodness of FitGoodness of Fit A fixed effects regression returns A fixed effects regression returns threethree

“R-square” measures. They are each “R-square” measures. They are each actually squared correlations between actually squared correlations between predicted and observed values:predicted and observed values:

1. 1. Within RWithin R22: fitted : fitted de-meaned yde-meaned yitit

2. 2. Between RBetween R22: fitted : fitted y_bary_barii

3. 3. Overall ROverall R22: fitted : fitted yyit it (pooled OLS)(pooled OLS)

Page 43: Pooled Cross-Section Time Series Data

4343

Panel Regressions in Panel Regressions in StataStata XT = cross-section time series.XT = cross-section time series. ““xtreg y x, fe” will run a panel fixed xtreg y x, fe” will run a panel fixed

effects regression.effects regression.

Must declare your “i” and “t” identifiers:Must declare your “i” and “t” identifiers:• tsset code time, for example.tsset code time, for example.

Unfortunately, Unfortunately, StataStata refers to the time- refers to the time-invariant error component (our invariant error component (our aaii) as ) as u_iu_i..

Page 44: Pooled Cross-Section Time Series Data

4444

Fixed Effects Fixed Effects StataStata Example Examplextreg y cpi r er,fextreg y cpi r er,fe

Fixed-effects (within) regression Number of obs = 990Fixed-effects (within) regression Number of obs = 990Group variable (i): code Number of groups = 4Group variable (i): code Number of groups = 4

R-sq: within = 0.7071 Obs per group: min = 244R-sq: within = 0.7071 Obs per group: min = 244 between = 0.0335 avg = 247.5between = 0.0335 avg = 247.5 overall = 0.1827 max = 250overall = 0.1827 max = 250

F(3,983) = 791.14F(3,983) = 791.14corr(u_i, Xb) = -0.7495 Prob > F = 0.0000corr(u_i, Xb) = -0.7495 Prob > F = 0.0000

------------------------------------------------------------------------------------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+-----------------------------------------------------------------------------+---------------------------------------------------------------- cpi | .3817633 .0117365 32.53 0.000 .3587318 .4047948cpi | .3817633 .0117365 32.53 0.000 .3587318 .4047948 r | -.4944136 .0780945 -6.33 0.000 -.6476647 -.3411625r | -.4944136 .0780945 -6.33 0.000 -.6476647 -.3411625 er | -.0196729 .0014589 -13.49 0.000 -.0225358 -.0168101er | -.0196729 .0014589 -13.49 0.000 -.0225358 -.0168101 _cons | 70.49544 1.529625 46.09 0.000 67.49374 73.49715_cons | 70.49544 1.529625 46.09 0.000 67.49374 73.49715-------------+-----------------------------------------------------------------------------+---------------------------------------------------------------- sigma_u | 16.538008 (std. error of time-invariant error) sigma_u | 16.538008 (std. error of time-invariant error) sigma_e | 6.3818613 (std. error of idiosyncratic error) sigma_e | 6.3818613 (std. error of idiosyncratic error) rho | .87038904 (fraction of variance due to u_i)rho | .87038904 (fraction of variance due to u_i)------------------------------------------------------------------------------------------------------------------------------------------------------------

F test that all u_i=0: F(3, 983) = 362.02 Prob > F = 0.0000F test that all u_i=0: F(3, 983) = 362.02 Prob > F = 0.0000

Page 45: Pooled Cross-Section Time Series Data

4545

Random EffectsRandom Effects Assumes CORR(Assumes CORR(aaii, X, Xitit) = 0.) = 0. Therefore, OLS coefficients will not Therefore, OLS coefficients will not

suffer “composite error bias”, as was suffer “composite error bias”, as was assumed with Fixed Effects.assumed with Fixed Effects.

we do not need to we do not need to eliminateeliminate aaii terms.terms.

Although Although aaii terms do not truly have to terms do not truly have to be “randomly” assigned, there is no be “randomly” assigned, there is no structural relationship between structural relationship between aaii and and XXitit in a correctly specified model.in a correctly specified model.

Page 46: Pooled Cross-Section Time Series Data

4646

Random EffectsRandom Effects Even when Even when CORR(CORR(aaii, X, Xitit) = 0, ) = 0, we still have to we still have to

account for the serial correlation introduced account for the serial correlation introduced by the by the aaii error component.error component.

A “Quasi-demeaned” data transformation is A “Quasi-demeaned” data transformation is used to accomplish this, wherein used to accomplish this, wherein aaii are are altered but not eliminated.altered but not eliminated.

A bonus is that time-invariant dummies are A bonus is that time-invariant dummies are not eliminated.not eliminated.

Page 47: Pooled Cross-Section Time Series Data

4747

Random Effects AssumptionsRandom Effects Assumptions 1. E(a1. E(aii |X |Xitit) = E(a) = E(aii) = 0, ) = 0,

• independence of aindependence of aii’s’sand X’s. cov(aand X’s. cov(aii,X,Xitit)=0)=0

2. E(u2. E(uitit | X | Xitit, a, aii) = 0) = 0

3. E(u3. E(uitituuisis) = cov(u) = cov(uitit,u,uisis) = 0 for all t≠s.) = 0 for all t≠s.

4. E(u4. E(uitit22 |X |Xitit,a,aii) = ) = 22

uu = constant = constant

5. E(a5. E(aii22 | X | Xitit) = Var(a) = Var(aii) = ) = 22

aa

Page 48: Pooled Cross-Section Time Series Data

4848

Random EffectsRandom Effects Under the preceding criteria, the Under the preceding criteria, the

composite error does not violate OLS composite error does not violate OLS assumptions.assumptions.

Unnecessarily eliminating the Unnecessarily eliminating the aaii terms terms will cause estimates to be will cause estimates to be inefficientinefficient..

Don’t use Don’t use Fixed EffectsFixed Effects unless warranted. unless warranted.

Page 49: Pooled Cross-Section Time Series Data

4949

Random EffectsRandom Effects However, running Pooled OLS will not However, running Pooled OLS will not

be appropriate because the be appropriate because the composite errors are still serially composite errors are still serially correlated over time.correlated over time.

It can be shown that:It can be shown that:

stvvcorrua

aisit

,),( 22

2

Where, again: vit = uit + ai

Page 50: Pooled Cross-Section Time Series Data

5050

Random EffectsRandom Effects Random effects transformation is Random effects transformation is

more complicated than FD or FE, but more complicated than FD or FE, but basic idea is to eliminate serial basic idea is to eliminate serial correlation in the error term by using correlation in the error term by using information on variances of fixed and information on variances of fixed and idiosyncratic errors.idiosyncratic errors.

Page 51: Pooled Cross-Section Time Series Data

5151

Random Effects (Random Effects (RERE))

Transformation results in a Transformation results in a weighted weighted averageaverage of the estimates provided by of the estimates provided by the “within” and “between” the “within” and “between” estimators.estimators.

Page 52: Pooled Cross-Section Time Series Data

5252

RERE Transformation Transformation

22

2

10

10

10

ˆˆˆ

1ˆ where

)()1()()1()(

(1) fromsubtract then

:average weighteda Define

)1(

uai

ui

iitiiitiit

iiii

itiitit

T

uuaxxyy

uaxy

uaxy

Page 53: Pooled Cross-Section Time Series Data

5353

It can be shown that the composite error It can be shown that the composite error term term vvitit augmented by the weighting augmented by the weighting term term (lambda) will NOT suffer from (lambda) will NOT suffer from serial correlation.serial correlation.

Corr(Corr(vvitit, , vvisis) = 0) = 0

22

2

ˆˆˆ

1ˆGiven uai

ui T

Page 54: Pooled Cross-Section Time Series Data

5454

NOTE:NOTE: If var(If var(aaii) = 0) = 0, meaning , meaning aaii is always zero is always zero

(no time-invariant effects), then lambda (no time-invariant effects), then lambda equals 0 and RE regression is equivalent equals 0 and RE regression is equivalent to Pooled OLS equation (1) - all lambda-to Pooled OLS equation (1) - all lambda-weighted terms drop out. weighted terms drop out.

As As 22aa dominates dominates 22

uu, , aaii terms become terms become more important, more important, goes to 1, and RE goes to 1, and RE→FE.→FE.

22

2

ˆˆˆ

1ˆGiven uai

ui T

Page 55: Pooled Cross-Section Time Series Data

5555

RE StataRE Stata Example (N=4) Example (N=4)xtreg y cpi r erxtreg y cpi r erRandom-effects GLS regression Number of obs = 990Random-effects GLS regression Number of obs = 990Group variable (i): code Number of groups = 4Group variable (i): code Number of groups = 4R-sq: within = 0.6252 Obs per group:min = 24R-sq: within = 0.6252 Obs per group:min = 24 beween = 0.7702 avg = 247.5beween = 0.7702 avg = 247.5 overall = 0.4662 max = 250overall = 0.4662 max = 250Random effects u_i ~ Gaussian Wald chi2(3) = 861.17Random effects u_i ~ Gaussian Wald chi2(3) = 861.17corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000-------------------------------------------------------------------------------------------------------------------- y | Coef. Std. Err. z P>|z| y | Coef. Std. Err. z P>|z| -------------+---------------------------------------------------------+-------------------------------------------- cpi | .3468475 .0158341 21.91 0.000 cpi | .3468475 .0158341 21.91 0.000 r | .0072637 .1077631 0.07 0.946 r | .0072637 .1077631 0.07 0.946 er | -.0002592 .0005834 -0.44 0.657 er | -.0002592 .0005834 -0.44 0.657 _cons | 61.17895 2.168505 28.21 0.000 _cons | 61.17895 2.168505 28.21 0.000 -------------+---------------------------------------------------------+-------------------------------------------- sigma_u | 0sigma_u | 0 sigma_e | 6.3818613sigma_e | 6.3818613 rho | 0 (fraction of variance due to u_i)rho | 0 (fraction of variance due to u_i)

Page 56: Pooled Cross-Section Time Series Data

5656

Fixed vs. Random EffectsFixed vs. Random Effects As a practical matter, Random Effects is preferred As a practical matter, Random Effects is preferred

when key explanatory variables are time-when key explanatory variables are time-invariant.invariant.

The Fixed Effects view is that the unobserved The Fixed Effects view is that the unobserved heterogeneity is in itself an explanatory variable heterogeneity is in itself an explanatory variable that ideally would have a coefficient to be that ideally would have a coefficient to be estimated.estimated.

Page 57: Pooled Cross-Section Time Series Data

5757

Fixed vs. Random EffectsFixed vs. Random Effects

The The Random EffectsRandom Effects view is that view is that unobserved heterogeneity is unobserved heterogeneity is “randomly assigned” to each “randomly assigned” to each cross sectional entity and not cross sectional entity and not correlated with other correlated with other explanatory variables.explanatory variables.

Page 58: Pooled Cross-Section Time Series Data

5858

When to use FE vs. RE? When to use FE vs. RE? The Hausman Coefficient TestThe Hausman Coefficient Test

The logic of the test is the following:The logic of the test is the following:• If CORR(If CORR(aaii, , XXitit) ) 0, then RE is biased. 0, then RE is biased.

• If CORR(If CORR(aaii, , XXitit) = 0, then both RE and FE are ) = 0, then both RE and FE are unbiasedunbiased, but it can be shown that RE is more , but it can be shown that RE is more efficient (smaller standard error of efficient (smaller standard error of coefficienents)coefficienents)

• Therefore, if the FE coefficients are significantly Therefore, if the FE coefficients are significantly different from the RE coefficients, then RE must different from the RE coefficients, then RE must be biased, so use FE.be biased, so use FE.

• If FE coefficients are not significantly different If FE coefficients are not significantly different from RE, then neither is biased, so use RE.from RE, then neither is biased, so use RE.

Page 59: Pooled Cross-Section Time Series Data

5959

General Hausman TestGeneral Hausman Test test the equality of the vector of coefficients:test the equality of the vector of coefficients:

)ˆˆ()]ˆ()ˆ([)'ˆˆ( H

StatisticTest t.coefficieneach for terms varianceof vector )ˆ(

ˆˆˆ

ˆ ,ˆˆˆ

ˆ

1

*

3

2

1

3

2

1

REFEREFEREFE

E

RE

RE

RE

RE

FE

FE

FE

FE

VV

V

H is distributed Chi-square with k degrees of freedom

Page 60: Pooled Cross-Section Time Series Data

6060

Single Coefficient VersionSingle Coefficient Version If we are primarily interested in a single If we are primarily interested in a single

parameter, there is a parameter, there is a t-statistict-statistic version of version of the Hausman test.the Hausman test.

Let Let BB11FEFE

and and BB11RERE

be the fixed- and random be the fixed- and random effects coefficients for effects coefficients for XX1,it1,it

2/121

21

11

)()()(REFE

REFE

BseBseBBt

Where t is

asymptotically normally distributed

Page 61: Pooled Cross-Section Time Series Data

6161

Note: Hausman Test ProblemNote: Hausman Test Problem Most of the time the Hausman test Most of the time the Hausman test

works fine, however…works fine, however…

The test statistic is based on the The test statistic is based on the assumption that assumption that RE RE is more efficient is more efficient (estimates have a smaller variance) (estimates have a smaller variance) than than FE.FE.

Page 62: Pooled Cross-Section Time Series Data

6262

While this can be shown to be While this can be shown to be asymptotically true, it may not hold for a asymptotically true, it may not hold for a given sample.given sample.

If this is the case, then the test statistic is If this is the case, then the test statistic is negative, and cannot be interpreted as a negative, and cannot be interpreted as a Chi-square. Chi-square.

This is why it is important to type :This is why it is important to type :• ’’Hausman unbiased efficient’Hausman unbiased efficient’

Where ‘unbiased’ is the vector of Where ‘unbiased’ is the vector of FEFE coefficients and ‘efficient’ is the vector of coefficients and ‘efficient’ is the vector of RERE coefficients coefficients

Page 63: Pooled Cross-Section Time Series Data

6363

Hausman Test InterpretationHausman Test Interpretation HH00: : FEFE = = RERE (difference in coefficients (difference in coefficients

is NOT systematic)is NOT systematic) HHAA: : FEFE RERE..

If If HH > critical value, we reject H > critical value, we reject H00, , • conclude that since conclude that since FEFE RERE

• Random Effects is biased, thereforeRandom Effects is biased, therefore• CORR(CORR(aaii, , XXitit) ) 0, and 0, and• Fixed Effects is the appropriate model. Fixed Effects is the appropriate model.

Page 64: Pooled Cross-Section Time Series Data

6464

Hausman Test in Hausman Test in StataStataxtreg y cpi r er,fextreg y cpi r er,feestimates store feestimates store fextreg y cpi r erxtreg y cpi r erestimates store reestimates store rehausman fe rehausman fe re ---- Coefficients -------- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B))| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fe re Difference S.E.| fe re Difference S.E.----+----------------------------------------------------------------+------------------------------------------------------------ cpi | .3817633 .3468475 .0349158 .cpi | .3817633 .3468475 .0349158 . r | -.4944136 .0072637 -.5016774 .r | -.4944136 .0072637 -.5016774 . er | -.0196729 -.0002592 -.0194137 .0013371er | -.0196729 -.0002592 -.0194137 .0013371---------------------------------------------------------------------------------------------------------------------------------- b = consistent under Ho and Ha; obtained from xtregb = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtregB = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematicTest: Ho: difference in coefficients not systematic

chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 162.38= 162.38 Prob>chi2 = Prob>chi2 = 0.00000.0000 (V_b-V_B is not positive definite)(V_b-V_B is not positive definite)

Reject HReject H00 in this case, so go with Fixed Effects in this case, so go with Fixed Effects

Page 65: Pooled Cross-Section Time Series Data

6565

Lagrange Multiplier Test for Lagrange Multiplier Test for Random EffectsRandom Effects

Essentially, this is a derivation of a test for Essentially, this is a derivation of a test for heteroskedasticity in a panel composite heteroskedasticity in a panel composite error setting, where error setting, where vvitit = a = aii + u + uitit..

Assume Assume var(uvar(uitit) ) is constant, and is constant, and uuitit is not is not correlated with correlated with XXitit..

Then any correlation between var(Then any correlation between var(vvitit) ) and and XXitit must be due to the time-invariant error must be due to the time-invariant error aaii. .

Page 66: Pooled Cross-Section Time Series Data

6666

StataStata Note for Panel Regressions Note for Panel Regressions

You will notice that running You will notice that running FE / REFE / RE regressions with large regressions with large NN can be time consuming, which is really annoying during the specification search process.

This is because each regression requires Stata to perform the ‘de-meaning’ transformation for each observation from the original data.

Page 67: Pooled Cross-Section Time Series Data

6767

Stata NoteStata Note The ‘The ‘xtdata’xtdata’ command allows you to command allows you to

create a new data set of the transformed create a new data set of the transformed variables.variables.

Running OLS on the transformed variables Running OLS on the transformed variables is equivalent to the transformed FE/RE is equivalent to the transformed FE/RE regression.regression.

Typing ‘Typing ‘xtdata y x1 x2,fextdata y x1 x2,fe’ will create a new ’ will create a new .dta.dta file with the fixed effect de-meaned file with the fixed effect de-meaned values of the specified variables for each values of the specified variables for each observation.observation.

Page 68: Pooled Cross-Section Time Series Data

6868

Extensions to Panel RegressionExtensions to Panel Regression 1.)1.) 2SLS/IV with panel 2SLS/IV with panel

Xtivreg y x1 (x2=z), feXtivreg y x1 (x2=z), fe

2.) Cluster effects for cross-sectional 2.) Cluster effects for cross-sectional data. data.

3.)3.) Auto-correlated idiosynchratic errors Auto-correlated idiosynchratic errors ((uuitit))

Page 69: Pooled Cross-Section Time Series Data

6969

Extension 1: IV PanelExtension 1: IV Panel

When an independent variable is When an independent variable is endogenous in a panel regression, endogenous in a panel regression, each stage of the two stage least each stage of the two stage least squares process must take into squares process must take into account the composite error issue.account the composite error issue.

i.e. the first stage and second stage i.e. the first stage and second stage will either be RE or FE regression, will either be RE or FE regression, depending on which is appropriate.depending on which is appropriate.

Page 70: Pooled Cross-Section Time Series Data

7070

YYitit = B = B00 + + XXitit + a + aii + u + uitit

The fixed effects transformation will The fixed effects transformation will address the issue of address the issue of

COV(COV(XXitit,a,aii) ) ≠ 0.≠ 0.

But what about when But what about when COV(COV(XXitit,u,uitit) ) ≠ 0?≠ 0?

Page 71: Pooled Cross-Section Time Series Data

7171

Panel 2SLSPanel 2SLS

.instrument effectivean as used be to

ation transformeffects fixed therequire willbut (1),in exogenous be will Therefore

0) ,( but 0) ,(

such that variableDefine

0) ,( 0) ,(

)1( 10

it

itit

iit

it

itit

iit

itiitit

zuzCORRazCORRz

uxCORRaxCORR

uaxy

Page 72: Pooled Cross-Section Time Series Data

7272

First Stage FEFirst Stage FE

.ˆ of valuesed, transformfitted, Save

unbiased. is ˆ

)( where,

1

10

it

FE

iititititit

x

zzzezx

Page 73: Pooled Cross-Section Time Series Data

7373

Second Stage FESecond Stage FE

unbiased. be willˆ

hat) theof (because 0) ,ˆ(

umlat) theof (because 0) ,ˆ(

ˆ )'1(

12,

1

SLSFE

itit

iit

ititit

uxCORR

axCORR

uxy

Page 74: Pooled Cross-Section Time Series Data

7474

Extension 2: Cluster RegressionExtension 2: Cluster Regression Allows for a Fixed Effects transformation with Allows for a Fixed Effects transformation with

single period cross-section data.single period cross-section data.

““cluster-” or “group-” invariant errors replace cluster-” or “group-” invariant errors replace “time-invariant” errors (a“time-invariant” errors (aii).).

For example, there may be “within village For example, there may be “within village effects” that will be the same for all households effects” that will be the same for all households in Village A that differ from Village B.in Village A that differ from Village B.

Often can be controlled for with “cluster Often can be controlled for with “cluster dummy” variables.dummy” variables.

Page 75: Pooled Cross-Section Time Series Data

7575

Cross Section Cluster ExampleCross Section Cluster ExampleHouseholdHousehold

((ii))Village Village

((jj))ConsumptioConsumptio

n (Yn (Yijij))Income Income

(X(Xijij))11 11 500500 75075022 11 650650 1000100033 11 475475 72572511 22 600600 70070022 22 625625 75075033 22 550550 60060011 33 575575 1100110022 33 625625 1200120033 33 600600 10001000

Page 76: Pooled Cross-Section Time Series Data

7676

Cluster RegressionCluster Regression Model:Model: XXijij = observation for household = observation for household ii in village in village jj

YYijij = B = B00 + + XXijij + a + ajj + u + uijij

The analogy to panel structure is that The analogy to panel structure is that ii acts like acts like the time variable, and the time variable, and jj acts like the cross- acts like the cross-sectional identifier.sectional identifier.

Multiple observations for a given village Multiple observations for a given village jj.. aajj is the “cluster invariant error” or “is the “cluster invariant error” or “village village

level fixed effectlevel fixed effect””

Page 77: Pooled Cross-Section Time Series Data

7777

Fixed Effects for ClusterFixed Effects for Cluster Again, if there is correlation between Again, if there is correlation between

the “cluster-invariant” error (athe “cluster-invariant” error (ajj) and the ) and the independent variables (Xindependent variables (Xijij), then the ), then the coefficient estimates will be biased.coefficient estimates will be biased.

Fixed Effects transformation eliminates Fixed Effects transformation eliminates the the aajj by subtracting the cluster mean by subtracting the cluster mean from each observation.from each observation.

ijijFE

ij

j

jijjjjijjij

uXBY

Y

uuaaXXBYY

1

1

mean level village

)()()()(

Page 78: Pooled Cross-Section Time Series Data

7878

Cluster Effects TransformationCluster Effects Transformation

ii jj yy xx ybarybarjj y_umlaty_umlatijij xbarxbarjj x_umlatx_umlatijij

11 11 500500 750750 541.67541.67 -41.67-41.67 825825 -75-7522 11 650650 10001000 541.67541.67 108.33108.33 825825 17517533 11 475475 725725 541.67541.67 -66.67-66.67 825825 -100-10011 22 600600 700700 591.67591.67 8.33338.3333 683.33683.33 16.6716.6722 22 625625 750750 591.67591.67 33.33333.333 683.33683.33 66.6766.6733 22 550550 600600 591.67591.67 -41.67-41.67 683.33683.33 -83.3-83.311 33 575575 11001100 600600 -25-25 11001100 0022 33 625625 12001200 600600 2525 11001100 10010033 33 600600 10001000 600600 00 11001100 -100-100

Page 79: Pooled Cross-Section Time Series Data

7979

Transformed OLS RegressionTransformed OLS Regressionreg y_umlat x_umlatreg y_umlat x_umlat

Source | SS df MS Number of obs = 9Source | SS df MS Number of obs = 9-------------+------------------------------ F( 1, 7) = 27.86-------------+------------------------------ F( 1, 7) = 27.86 Model | 17649.2873 1 17649.2873 Prob > F = .0011Model | 17649.2873 1 17649.2873 Prob > F = .0011 Residual | 4434.04639 7 633.435199 R-squared = .7992Residual | 4434.04639 7 633.435199 R-squared = .7992-------------+------------------------------ Adj R-squared = .7705-------------+------------------------------ Adj R-squared = .7705 Total | 22083.3337 8 2760.41671 Root MSE = 25.168Total | 22083.3337 8 2760.41671 Root MSE = 25.168

------------------------------------------------------------------------------------------------------------------------------------------------ y_umlat | Coef. Std. Err. t P>|t| y_umlat | Coef. Std. Err. t P>|t| -------------+-----------------------------------------------------------------------+---------------------------------------------------------- x_umlat | x_umlat | .4759358.4759358 .0901646 5.28 0.001 .0901646 5.28 0.001 _cons | 4.09e-07 8.38938 0.00 1.000 _cons | 4.09e-07 8.38938 0.00 1.000 ------------------------------------------------------------------------------------------------------------------------------------------------

Page 80: Pooled Cross-Section Time Series Data

8080

FIXED EFFECTStsset j i panel variable: j (strongly balanced) time variable: i, 1 to 3 delta: 1 unit

xtreg y x,feFixed-effects (within) regression Number of obs = 9Group variable: j Number of groups = 3 within = 0.7992 Obs per group: min = 3 between = 0.0961 avg = 3.0 overall = 0.2517 max = 3

F(1,5) = 19.90corr(u_i, Xb) = -0.8365 Prob > F = 0.0066--------------------------------------------------------------------- y | Coef. Std. Err. t P>|t| -------------+------------------------------------------------------- x | .4759358 .1066842 4.46 0.007 _cons | 163.978 93.28558 1.76 0.139 -------------+------------------------------------------------------- sigma_u | 95.865925 sigma_e | 29.779343 rho | .91199744 (fraction of variance due to u_i)---------------------------------------------------------------------F test that all u_i=0: F(2, 5) = 9.34 Prob > F = 0.0205

Page 81: Pooled Cross-Section Time Series Data

8181

Cluster (village) DummiesCluster (village) Dummiesxi:reg y x i.jxi:reg y x i.j

i.j _Ij_1-3 (naturally coded; _Ij_1 omitted)i.j _Ij_1-3 (naturally coded; _Ij_1 omitted)

Source | SS df MS Number of obs = 9Source | SS df MS Number of obs = 9-------------+------------------------------ F( 3, 5) = 8.88-------------+------------------------------ F( 3, 5) = 8.88 Model | 23621.5092 3 7873.8364 Prob > F = 0.0191Model | 23621.5092 3 7873.8364 Prob > F = 0.0191 Residual | 4434.04635 5 886.809269 R-squared = 0.8420Residual | 4434.04635 5 886.809269 R-squared = 0.8420-------------+------------------------------ Adj R-squared = 0.7471-------------+------------------------------ Adj R-squared = 0.7471 Total | 28055.5556 8 3506.94444 Root MSE = 29.779Total | 28055.5556 8 3506.94444 Root MSE = 29.779

---------------------------------------------------------------------------------------------------------------------------------------------- y | Coef. Std. Err. t P>|t| y | Coef. Std. Err. t P>|t| -------------+----------------------------------------------------------------------+--------------------------------------------------------- x | x | .4759358.4759358 .1066842 4.46 0.007 .1066842 4.46 0.007 Vlg2 _Ij_2 | 117.4242 28.62912 4.10 0.009 Vlg2 _Ij_2 | 117.4242 28.62912 4.10 0.009 Vlg3 _Ij_3 | -72.54902 38.10424 -1.90 0.115 Vlg3 _Ij_3 | -72.54902 38.10424 -1.90 0.115 Vlg1 _cons | 149.0196 89.678 1.66 0.157 Vlg1 _cons | 149.0196 89.678 1.66 0.157 ----------------------------------------------------------------------------------------------------------------------------------------------

Page 82: Pooled Cross-Section Time Series Data

8282

““predict ai, u” to view the estimated predict ai, u” to view the estimated aaii

ii jj _Ij_2_Ij_2 _Ij_3_Ij_3 aiai11 11 00 00 -14.9584-14.958422 11 00 00 -14.9584-14.958433 11 00 00 -14.9584-14.958411 22 11 00 102.4658102.465822 22 11 00 102.4658102.465833 22 11 00 102.4658102.465811 33 00 11 -87.5074-87.507422 33 00 11 -87.5074-87.507433 33 00 11 -87.5074-87.5074

Page 83: Pooled Cross-Section Time Series Data

8383

Aside. . . ”xtdes” commandAside. . . ”xtdes” command xtdes j: 1, 2, ..., 3 n = 3 i: 1, 2, ..., 3 T = 3 Delta(i) = 1 unit Span(i) = 3 periods (j*i uniquely identifies each observation)

Distribution of T_i:min 5% 25% 50% 75% 95% max 3 3 3 3 3 3 3

Freq. Percent Cum. | Pattern ---------------------------+--------- 3 100.00 100.00 | 111 ---------------------------+--------- 3 100.00 | XXX

Page 84: Pooled Cross-Section Time Series Data

8484

Extension 3: Autocorrelation of Extension 3: Autocorrelation of uuitit’s’s

Random Effects transformation eliminated Random Effects transformation eliminated autocorrelation amongst composite errors autocorrelation amongst composite errors due to presence of due to presence of aaii..

Fixed Effects eliminated autocorrelation due Fixed Effects eliminated autocorrelation due to to aaii by eliminating the time-invariant error.by eliminating the time-invariant error.

What if, in addition, What if, in addition, uuitit is autocorrelated?is autocorrelated?RE or FE alone will not address the issue. RE or FE alone will not address the issue.

Page 85: Pooled Cross-Section Time Series Data

8585

Panel FE Regression with ACPanel FE Regression with AC

ittitiitit

iittiitit

it

ittiit

iit

itiitit

uuxy

uuxyN

uuaxCORR

uaxy

)( )1.2(

)2(),0(~

1 1-

0) ,(

(1)

1,1,1

1,1

2

1,

10

Page 86: Pooled Cross-Section Time Series Data

8686

Equation (2.1) is now a linear AR(1) Model.Equation (2.1) is now a linear AR(1) Model.

To solve, we need to use the Cochrane-To solve, we need to use the Cochrane-Orcutt method of estimating Orcutt method of estimating , then using , then using the generalized difference equation to the generalized difference equation to eliminate the term:eliminate the term:

)( 1,1, titi uu

Page 87: Pooled Cross-Section Time Series Data

8787

STATA to the rescue again!STATA to the rescue again! The command:The command:

““xtregar y x,fe”xtregar y x,fe”

Will simultaneously transform the data Will simultaneously transform the data to eliminate the to eliminate the aaii terms AND estimate terms AND estimate AND provide consistent standard AND provide consistent standard errors with the generalized difference errors with the generalized difference equation. equation.

Page 88: Pooled Cross-Section Time Series Data

8888

Xtregar Example from 4 country panelXtregar Example from 4 country panelxtregar y r cpi er,fextregar y r cpi er,fe

FE (within) regression with AR(1) disturbances Number of obs =986FE (within) regression with AR(1) disturbances Number of obs =986Group variable: code Number of groups =4Group variable: code Number of groups =4

R-sq: within = 0.0155 Obs per group: min =243R-sq: within = 0.0155 Obs per group: min =243 between = 0.5840 avg =246.5between = 0.5840 avg =246.5 overall = 0.4567 max =249overall = 0.4567 max =249

F(3,979) = 5.13F(3,979) = 5.13corr(u_i, Xb) = -0.1308 Prob > F = 0.0016corr(u_i, Xb) = -0.1308 Prob > F = 0.0016------------------------------------------------------------------------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| y | Coef. Std. Err. t P>|t| -------------+-----------------------------------------------------------------------+---------------------------------------------------------- r | -.0362285 .0875633 -0.41 0.679 r | -.0362285 .0875633 -0.41 0.679 cpi | .2832925 .076438 3.71 0.000 cpi | .2832925 .076438 3.71 0.000 er | .0015201 .0029347 0.52 0.605 er | .0015201 .0029347 0.52 0.605 _cons | 68.766 .2288196 300.52 0.000 _cons | 68.766 .2288196 300.52 0.000 -------------+-----------------------------------------------------------------------+---------------------------------------------------------- rho_ar | .9718915rho_ar | .9718915 sigma_u | 6.3918957sigma_u | 6.3918957 sigma_e | 1.7246814sigma_e | 1.7246814 rho_fov | .93213626 (fraction of variance because of u_i)rho_fov | .93213626 (fraction of variance because of u_i)------------------------------------------------------------------------------------------------------------------------------------------------F test that all u_i=0: F(3,979) = 2.14 Prob > F = 0.094F test that all u_i=0: F(3,979) = 2.14 Prob > F = 0.094

Page 89: Pooled Cross-Section Time Series Data

8989

Stata Note – balancing your panelStata Note – balancing your panel

It may be useful to use only those It may be useful to use only those “entities” that appear in all time “entities” that appear in all time periods. Suppose periods. Suppose T=20T=20 – use the – use the following:following:

Sort entity timeSort entity timeby entity: gen count=_Nby entity: gen count=_Nkeep if count==20keep if count==20

Page 90: Pooled Cross-Section Time Series Data

9090

Panel Data Management in STATAPanel Data Management in STATA

Common problem is that original data Common problem is that original data is stored in “wide” or “rectangular” is stored in “wide” or “rectangular” form, wherein values for a given year form, wherein values for a given year are stored in a separate column.are stored in a separate column.

For example, in a cross-country panel, For example, in a cross-country panel, FDI in 2000 has one column, with each FDI in 2000 has one column, with each row representing a unique country. row representing a unique country. Likewise for FDI in 2001, etc. Likewise for FDI in 2001, etc.

Page 91: Pooled Cross-Section Time Series Data

9191

Example of “wide” form data setExample of “wide” form data set

Countries Countries CodeCode fdi2000fdi2000 fdi2001fdi2001 fdi2002fdi2002Argentina Argentina 11 1.04E+101.04E+10 2.17E+092.17E+09 2.15E+092.15E+09Australia Australia 22 1.36E+101.36E+10 8.26E+098.26E+09 1.77E+101.77E+10

Austria Austria 33 8.52E+098.52E+09 5.91E+095.91E+09 3.19E+083.19E+08

Bangladesh Bangladesh 44 2.80E+082.80E+08 7.90E+077.90E+07 5.20E+075.20E+07

Page 92: Pooled Cross-Section Time Series Data

9292

ProblemProblem In order to run a panel regression in In order to run a panel regression in

STATA, we need data to be stored in STATA, we need data to be stored in “long” form.“long” form.

Here, each row is identified by both a Here, each row is identified by both a time period and country code. A time period and country code. A variable like FDI will have a single variable like FDI will have a single column.column.

Page 93: Pooled Cross-Section Time Series Data

9393

Example of “long” form data setExample of “long” form data set codecode yearyear countries countries fdifdi

11 20002000 Argentina Argentina 1.040e+101.040e+1011 20012001 Argentina Argentina 2.170e+092.170e+0911 20022002 Argentina Argentina 2.150e+092.150e+09

22 20002000 Australia Australia 1.360e+101.360e+1022 20012001 Australia Australia 8.260e+098.260e+0922 20022002 Australia Australia 1.770e+101.770e+10

Page 94: Pooled Cross-Section Time Series Data

9494

The “reshape” STATA commandThe “reshape” STATA command

Instead of copying and pasting in Instead of copying and pasting in excel, load the data into STATA as excel, load the data into STATA as “wide” form, then transform.“wide” form, then transform.

The “reshape” command will The “reshape” command will generate the “time” variable for you, generate the “time” variable for you, and combine separate time periods and combine separate time periods into a single column.into a single column.

Page 95: Pooled Cross-Section Time Series Data

9595

reshape long fdi, i(code) j(year)reshape long fdi, i(code) j(year)

Keys on specified variable, here “fdi”.Keys on specified variable, here “fdi”.

Must declare cross-section identifier Must declare cross-section identifier i.i.

Generates “within” group identifier Generates “within” group identifier jj. . Put new Put new varnamevarname in parentheses. in parentheses. Typically Typically jj will represent time, but not will represent time, but not necessarily. necessarily.

Page 96: Pooled Cross-Section Time Series Data

9696

Reshape NotesReshape Notes In general, list all variables that must be In general, list all variables that must be

combined into a single column.combined into a single column. You do not need to list time-invariant You do not need to list time-invariant

variables, but they will be converted to variables, but they will be converted to “long” as well.“long” as well.

Note that “reshape wide” will convert data Note that “reshape wide” will convert data from long to wide format.from long to wide format.

Seems to be touchy about year values. ‘99 Seems to be touchy about year values. ‘99 for 1999 is ok, but ‘00 for 2000 is a for 1999 is ok, but ‘00 for 2000 is a problem.problem.

Page 97: Pooled Cross-Section Time Series Data

9797

Fixed Effects LogitFixed Effects Logit

)](1log[)1()(loglog

1)(

))((1))(Pr(

)0Pr()1Pr(0 if 1

0* if 1)1,0( where*

iititiitit

X

X

iit

iit

iitit

itiitit

itiitit

itit

ititiitit

XGyXGyL

eeXG

XGXu

uXyuXy

yyyuXy

iit

iit