Econometrics I - New York...

25
Part 4: Partial Regression and Correlation 4-1/25 Econometrics I Professor William Greene Stern School of Business Department of Economics

Transcript of Econometrics I - New York...

Page 1: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-1/25

Econometrics I Professor William Greene

Stern School of Business

Department of Economics

Page 2: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-2/25

Econometrics I

Part 4 – Partial Regression

and Partial Correlation

Page 3: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-3/25

Frisch-Waugh (1933) ‘Theorem’

Context: Model contains two sets of variables:

X = [ (1,time) : (other variables)]

= [X1 X2]

Regression model:

y = X11 + X22 + (population)

= X1b1 + X2b2 + e (sample)

Problem: Algebraic expression for the second set of least squares coefficients, b2

Page 4: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-4/25

Partitioned Solution

Method of solution (Why did F&W care? In 1933, matrix computation was not trivial!)

Direct manipulation of normal equations produces

, ] so and

( ) =

1 1 1 2 1

1 2

2 1 2 2 2

1 1 1 2 1 1

2 1 2 2 2 2

1 1 1 1 2 2 1

2 1 1 2 2 2 2 2 2 2 2 2 1 1

X X b X y

X X X X X yX = [X X X X = X y =

X X X X X y

X X X X b X y(X X)b = =

X X X X b X y

X X b X X b X y

X X b X X b X y ==> X X b X y - X X b

= 2 1 1X (y - X b )

Page 5: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-5/25

Partitioned Solution

Direct manipulation of normal equations produces

b2 = (X2X2)-1X2(y - X1b1)

What is this? Regression of (y - X1b1) on X2

If we knew b1, this is the solution for b2.

Important result (perhaps not fundamental). Note the result

if X2X1 = 0.

Useful in theory: Probably

Likely in practice? Not at all.

Page 6: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-6/25

Partitioned Inverse

Use of the partitioned inverse result produces a

fundamental result: What is the southeast

element in the inverse of the moment matrix?

1 1 1 2

2 1 2 2

-1X 'X X 'X

X 'X X 'X

Page 7: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-7/25

Partitioned Inverse

The algebraic result is:

[ ]-1(2,2) = {[X2’X2] - X2’X1(X1’X1)-1 X1’X2}

-1

= [X2’(I - X1(X1’X1)-1X1’)X2]

-1

= [X2’M1X2]-1

Note the appearance of an “M” matrix. How do we interpret this result?

Note the implication for the case in which X1 is a single variable. (Theorem, p. 37)

Note the implication for the case in which X1 is the constant term. (p. 38)

Page 8: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-8/25

Frisch-Waugh (1933) Basic Result

Lovell (JASA, 1963) did the matrix algebra.

Continuing the algebraic manipulation: b2 = [X2’M1X2]

-1[X2’M1y]. This is Frisch and Waugh’s famous result - the “double residual

regression.” How do we interpret this? A regression of residuals on residuals. “We get the same result whether we (1) detrend the other variables

by using the residuals from a regression of them on a constant and a time trend and use the detrended data in the regression or (2) just include a constant and a time trend in the regression and not detrend the data”

“Detrend the data” means compute the residuals from the

regressions of the variables on a constant and a time trend.

Page 9: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-9/25

Important Implications

Isolating a single coefficient in a regression. (Corollary 3.2.1, p. 37). The double residual regression.

Regression of residuals on residuals – ‘partialling’ out the effect of the other variables.

It is not necessary to ‘partial’ the other Xs out of y because M1 is idempotent. (This is a very useful result.) (i.e., X2M1’M1y = X2M1y)

(Orthogonal regression) Suppose X1 and X2 are orthogonal; X1X2 = 0. What is M1X2?

Page 10: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-10/25

Applying Frisch-Waugh

Using gasoline data from Notes 3.

X = [1, year, PG, Y], y = G as before.

Full least squares regression of y on X.

Partitioned regression strategy:

1. Regress PG and Y on (1,Year) (detrend them) and compute

residuals PG* and Y*

2. Regress G on (1,Year) and compute residuals G*. (This step

is not actually necessary.)

3. Regress G* on PG* and Y*. (Should produce -11.9827 and

0.0478181.

Page 11: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-11/25

Detrending the Variables – Pg

Pg* are the residuals from this regression

Page 12: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-12/25

Regression of detrended G on detrended Pg

and detrended Y

Page 13: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-13/25

Partial Regression

Important terms in this context: Partialing out the effect of X1. Netting out the effect … “Partial regression coefficients.” To continue belaboring the point: Note the interpretation of partial

regression as “net of the effect of …” This is the (very powerful) Frisch – Waugh Theorem. This is what is

meant by “controlling for the effect of X1.” Now, follow this through for the case in which X1 is just a constant term,

column of ones. What are the residuals in a regression on a constant. What is M1? Note that this produces the result that we can do linear regression on data

in mean deviation form. 'Partial regression coefficients' are the same as 'multiple regression

coefficients.' It follows from the Frisch-Waugh theorem.

Page 14: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-14/25

Partial Correlation

Working definition. Correlation between sets of residuals. Some results on computation: Based on the M matrices. Some important considerations: Partial correlations and coefficients can have signs and magnitudes that differ greatly from gross correlations and

simple regression coefficients. Compare the simple (gross) correlation of G and PG with the partial

correlation, net of the time effect. (Could you have predicted the negative partial correlation?)

CALC;list;Cor(g,pg)$ Result = .7696572 CALC;list;cor(gstar,pgstar)$ Result = -.6589938

G

1. 00

1. 50

2. 00

2. 50

3. 00

3. 50

4. 00

4. 50

. 50

80 90 100 110 12070PG

Page 15: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-15/25

Partial Correlation

Page 16: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-16/25

THE Most Famous Application of

Frisch-Waugh: The Fixed Effects Model

A regression model with a dummy variable for

each individual in the sample, each observed Ti times.

There is a constant term for each individual.

yi = Xi + diαi + εi, for each individual

1 1

2 2

N

=

=

N

1

2

N

y X d 0 0 0

y X 0 d 0 0 βε

α

y X 0 0 0 d

β [X,D] ε

α

Zδ ε

N may be thousands. I.e., the regression has thousands of variables (coefficients).

N columns

Page 17: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-17/25

Application – Health and Income

German Health Care Usage Data, 7,293 Individuals,

Varying Numbers of Periods

Variables in the file are

Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

panel with N = 7,293 individuals. There are altogether n = 27,326 observations. The

number of observations ranges from 1 to 7 per family.

(Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).

The dependent variable of interest is

DOCVIS = number of visits to the doctor in the observation period

HHNINC = household nominal monthly net income in German marks / 10000.

(4 observations with income=0 were dropped)

HHKIDS = children under age 16 in the household = 1; otherwise = 0

EDUC = years of schooling

AGE = age in years

We desire also to include a separate family effect (7293 of them) for each family. This

requires 7293 dummy variables in addition to the four regressors.

Page 18: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-18/25

Estimating the Fixed Effects Model

The FE model is a linear regression model with many independent variables

1

1

Using the Frisch-Waugh theorem

=[ ]

D D

b X X X D X y

a D X D D D y

b X M X X M y

Page 19: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-19/25

Fixed Effects Estimator (cont.)

1 ( )

...

...

D

1

2

N

1 1

2 2

N N

M I D D D D

d 0 0

0 d 0D

0 0 d

d d 0 0

0 d d 0DD

0 0 d d

Page 20: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-20/25

Fixed Effects Estimator (cont.)

i i i

i i i

i i i

1

1

i

1 1 1T T T

1 1 1T T T

1 1 1T T T

( )

(The dummy variables are orthogonal)

( ) (1/T )

1 - - ... -

- 1 - ... -

... ... ... ...

- - ... 1 -

i i

D

1

D

2

DD

N

D

i

D T i i i T i

M I D D D D

M 0 0

0 M 0M

0 0 M

M I d d d d = I d d

=

Page 21: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-21/25

‘Within’ Transformations

1

i i i i

i i i i i i i i i ii

i i

N

i=1 i i

T

i i t=1k,l

( ) (1/T )

(1/T ) (1/T ) (T )

(T K) (T 1) * (1 K)

,

i i

i

D T i i i T i

i i

i

D D

i

D

M X I d d d d X = I d d X

X d d X X d x X d x

X M X = XM X

XM X

i

i

it,k i.,k it,l i.,l

N

i=1 i i

T

i i t=1 it,k i.,k it i.k

(x -x )(x -x ) (k,l element)

,

(x -x )(y -y )

i

D D

i

D

X M y = XM y

XM y

Page 22: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-22/25

Least Squares Dummy Variable Estimator

b is obtained by ‘within’ groups least squares (group

mean deviations)

Normal equations for a are D’Xb+D’Da=D’y

a = (D’D)-1D’(y – Xb) (see slide 5)

iT

i i t=1 it it ia=(1/T)Σ (y - )=ex b

Page 23: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-23/25

A Fixed Effects Regression Constant terms

Page 24: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-24/25

What happens to a time invariant

variable such as gender?

iTn n 2

i=1 i i i=1 t=1 it i.

it i it i. it i.

= a variable that is the same in every period (FEMALE)

(f -f )

but f = f , so f f and (f -f ) 0 for all i and t

This is a simple case of multicollinearity.

=

i

D D

f

f M f fM f

fi

diag(f )*D

Page 25: Econometrics I - New York Universitypeople.stern.nyu.edu/wgreene/Econometrics/Econometrics-I-4.pdf · Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced

Part 4: Partial Regression and Correlation 4-25/25