1. Descriptive Tools, Regression, Panel Data

40
[Topic 1-Regression] 1/37 1. Descriptive Tools, Regression, Panel Data

description

1. Descriptive Tools, Regression, Panel Data. Model Building in Econometrics. Parameterizing the model Nonparametric analysis Semiparametric analysis Parametric analysis Sharpness of inferences follows from the strength of the assumptions. - PowerPoint PPT Presentation

Transcript of 1. Descriptive Tools, Regression, Panel Data

Page 1: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 1/37

1. Descriptive Tools, Regression, Panel Data

Page 2: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 2/37

Model Building in Econometrics

• Parameterizing the model• Nonparametric analysis• Semiparametric analysis• Parametric analysis

• Sharpness of inferences follows from the strength of the assumptions

A Model Relating (Log)Wage to Gender and Experience

Page 3: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 3/37

Cornwell and Rupert Panel DataCornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 YearsVariables in the file areEXP = work experienceWKS = weeks workedOCC = occupation, 1 if blue collar, IND = 1 if manufacturing industrySOUTH = 1 if resides in southSMSA = 1 if resides in a city (SMSA)MS = 1 if marriedFEM = 1 if femaleUNION = 1 if wage set by union contractED = years of educationLWAGE = log of wage = dependent variable in regressionsThese data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155.

Page 4: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 4/37

Page 5: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 5/37

Nonparametric RegressionKernel regression of y on x

Semiparametric Regression: Least absolute deviations regression of y on x

Parametric Regression: Least squares – maximum likelihood – regression of y on x

Application: Is there a relationship between Log(wage) and Education?

Page 6: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 6/37

A First Look at the DataDescriptive Statistics

• Basic Measures of Location and Dispersion

• Graphical Devices• Box Plots• Histogram• Kernel Density Estimator

Page 7: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 7/37

Page 8: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 8/37

Box Plots

Page 9: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 9/37

From Jones and Schurer (2011)

Page 10: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 10/37

Histogram for LWAGE

Page 11: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 11/37

Page 12: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 12/37

The kernel density estimator is ahistogram (of sorts).

n i mm mi 1

** *x x1 1f̂(x ) K , for a set of points x

n B B

B "bandwidth" chosen by the analystK the kernel function, such as the normal or logistic pdf (or one of several others)x* the point at which the density is approximated.This is essentially a histogram with small bins.

Page 13: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 13/37

Kernel Density Estimator

n i mm mi 1

** *x x1 1f̂(x ) K , for a set of points x

n B B

B "bandwidth"K the kernel functionx* the point at which the density is approximated.

f̂(x*) is an estimator of f(x*)1

The curse of dimensionality

nii 1

3/5

Q(x | x*) Q(x*). n

1 1But, Var[Q(x*)] Something. Rather, Var[Q(x*)] * SomethingN N

ˆI.e.,f(x*) does not converge to f(x*) at the same rate as a meanconverges to a population mean.

Page 14: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 14/37

Kernel Estimator for LWAGE

Page 15: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 15/37

From Jones and Schurer (2011)

Page 16: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 16/37

Objective: Impact of Education on (log) Wage

• Specification: What is the right model to use to analyze this association?

• Estimation• Inference• Analysis

Page 17: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 17/37

Simple Linear RegressionLWAGE = 5.8388 + 0.0652*ED

Page 18: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 18/37

Multiple Regression

Page 19: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 19/37

Specification: Quadratic Effect of Experience

Page 20: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 20/37

Partial Effects

Education: .05654Experience .04045 - 2*.00068*ExpFEM -.38922

Page 21: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 21/37

Model Implication: Effect of Experience and Male vs. Female

Page 22: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 22/37

Hypothesis Test About Coefficients• Hypothesis

• Null: Restriction on β: Rβ – q = 0• Alternative: Not the null

• Approaches• Fitting Criterion: R2 decrease under the null?• Wald: Rb – q close to 0 under the

alternative?

Page 23: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 23/37

HypothesesAll Coefficients = 0?R = [ 0 | I ] q = [0]

ED Coefficient = 0?R = 0,1,0,0,0,0,0,0,0,0,0q = 0

No Experience effect?R = 0,0,1,0,0,0,0,0,0,0,0 0,0,0,1,0,0,0,0,0,0,0q = 0 0

Page 24: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 24/37

Hypothesis Test Statistics

2

2 21 0

121 1

Subscript 0 = the model under the null hypothesisSubscript 1 = the model under the alternative hypothesis

1. Based on the Fitting Criterion R

(R -R ) / J F = =F[J,N-K ]

(1-R ) / (N-K )

2. Bas

-12 -1

1 1

ed on the Wald Distance : Note, for linear models, W = JF.

Chi Squared = ( - ) s ( ) ( - )Rb q R X X R Rb q

Page 25: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 25/37

Hypothesis: All Coefficients Equal Zero

All Coefficients = 0?R = [0 | I] q = [0]R1

2 = .41826R0

2 = .00000F = 298.7 with [10,4154]Wald = b2-11[V2-11]-1b2-11

= 2988.3355Note that Wald = JF = 10(298.7)(some rounding error)

Page 26: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 26/37

Hypothesis: Education Effect = 0ED Coefficient = 0?R = 0,1,0,0,0,0,0,0,0,0,0,0q = 0R1

2 = .41826R0

2 = .35265 (not shown)F = 468.29Wald = (.05654-0)2/(.00261)2

= 468.29Note F = t2 and Wald = FFor a single hypothesis about 1 coefficient.

Page 27: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 27/37

Hypothesis: Experience Effect = 0No Experience effect?R = 0,0,1,0,0,0,0,0,0,0,0 0,0,0,1,0,0,0,0,0,0,0q = 0 0R0

2 = .33475, R12 = .41826

F = 298.15Wald = 596.3 (W* = 5.99)

Page 28: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 28/37

Built In Test

Page 29: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 29/37

Robust Covariance Matrix

• What does robustness mean?• Robust to: Heteroscedasticty• Not robust to:

• Autocorrelation• Individual heterogeneity• The wrong model specification

• ‘Robust inference’

-1 2 -1i i ii

The White Estimator

Est.Var[ ] = ( ) e ( )b X X x x X X

Page 30: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 30/37

Robust Covariance Matrix

Uncorrected

Page 31: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 31/37

Bootstrapping

Page 32: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 32/37

Estimating the Asymptotic Variance of an Estimator

• Known form of asymptotic variance: Compute from known results

• Unknown form, known generalities about properties: Use bootstrapping• Root N consistency• Sampling conditions amenable to central limit

theorems• Compute by resampling mechanism within the

sample.

Page 33: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 33/37

BootstrappingMethod:

1. Estimate parameters using full sample: b2. Repeat R times:

Draw n observations from the n, with replacement

Estimate with b(r). 3. Estimate variance with

V = (1/R)r [b(r) - b][b(r) - b]’ (Some use mean of replications instead of b.

Advocated (without motivation) by original designers of the method.)

Page 34: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 34/37

Application: Correlation between Age and Education

Page 35: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 35/37

Bootstrap Regression - Replications

namelist;x=one,y,pg$ Define Xregress;lhs=g;rhs=x$ Compute and

display bproc Define

procedureregress;quietly;lhs=g;rhs=x$ … Regression

(silent)endproc Ends

procedureexecute;n=20;bootstrap=b$ 20 bootstrap repsmatrix;list;bootstrp $ Display replications

Page 36: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 36/37

--------+-------------------------------------------------------------Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X--------+-------------------------------------------------------------Constant| -79.7535*** 8.67255 -9.196 .0000 Y| .03692*** .00132 28.022 .0000 9232.86 PG| -15.1224*** 1.88034 -8.042 .0000 2.31661--------+-------------------------------------------------------------Completed 20 bootstrap iterations.----------------------------------------------------------------------Results of bootstrap estimation of model.Model has been reestimated 20 times.Means shown below are the means of thebootstrap estimates. Coefficients shownbelow are the original estimates basedon the full sample.bootstrap samples have 36 observations.--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- B001| -79.7535*** 8.35512 -9.545 .0000 -79.5329 B002| .03692*** .00133 27.773 .0000 .03682 B003| -15.1224*** 2.03503 -7.431 .0000 -14.7654--------+-------------------------------------------------------------

Results of Bootstrap Procedure

Page 37: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 37/37

Bootstrap Replications

Full sample result

Bootstrapped sample results

Page 38: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 38/37

Multiple Imputation for Missing Data

Page 39: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 39/37

Imputed Covariance Matrix

Page 40: 1. Descriptive Tools, Regression, Panel Data

[Topic 1-Regression] 40/37

Implementation• SAS, Stata: Create full data sets with

imputed values inserted. M = 5 is the familiar standard number of imputed data sets.

• NLOGIT/LIMDEP • Create an internal map of the missing values

and a set of engines for filling missing values• Loop through imputed data sets during

estimation. • M may be arbitrary – memory usage and data

storage are independent of M.