1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D.,...

80
1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D.,...

Page 1: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

1

Lecture 7

Multiple Regression & Matrix Notation

Quantitative Methods 2

Edmund Malesky, Ph.D., UCSD

Page 2: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

Order of Presentation

1. Review of Variance of Beta Hat

2. Review of T-Tests

3. Review of Quadratic Equations

4. Introduction to Multiple Regression

5. The Role of Control Variables

6. Interpreting Regression Output

2

Page 3: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

What does the variance of beta hat tell us?

• Remember, we are working with a sample of the true population.• We are using that sample as a way to estimate the true relationships

between variables (regression parameters in the population.• As in QM1, we must always remember that our estimates will be slightly

different each time we sample from the population.• We know that the mean of repeated sampling, we equal the population

parameter, but we still might want to have some sense of the variance.• The smaller the variance, the more efficient the estimate• As a result, we need some sense of the range that would occur after

repeated sampling.• The confidence interval, derived from the standard error (SE) of the

regression parameter is the way we estimate that range.

3

Page 4: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

4

The Estimated Variance of β1hat

• has nice intuitive qualities• As the size of the errors decreases,

decreases– The line fits tightly through the data. Few other lines

could fit as well

• As the variation in x increases, decreases– Few lines will fit without large errors for extreme

values of x

2ˆ1

2ˆ1

2ˆ1

22 ˆˆ1 21

ˆ( )( )

u

i

Varx x

Remember, how we did this is STATA in Lecture 4. We divided the Root Mean Squared Error of the Model by the Standard Deviation of the Independent Variable. This gave us the Standard Error (SE) or the variance of Beta Hat.

Page 5: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

5

The Estimated Variance of β1hat

• Because the variance of the estimated errors has n in the denominator, as n increases, the variance of β1

hat decreases– The more data points we must fit to the line,

the smaller the number of lines that fit with few errors

– We have more information about where the line must go.

2

ˆ22ˆ

n

uiu

22 ˆˆ1 21

ˆ( )( )

u

i

Varx x

Page 6: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

6

Variance of β1hat is Critical for

Hypothesis Testing• T-test – tests that individual coefficients

are not zero.– This is the central task for testing most policy

theories

Page 7: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

7

T-Tests

• In general, our theories give us hypotheses that β0 >0 or β1 <0, etc.

• We can estimate β1hat , but we need a way

to assess the validity of statements that β1 is positive or negative, etc.

• We can rely on our estimate of β1hat and its

variance to use probability theory to test such statements.

Page 8: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

8

Z – Scores & Hypothesis Tests

• We know that β1hat ~ N(β1 , σβ )

• Subtracting β1 from both sides, we can see

that (β1hat - β1 ) ~ N( 0 , σβ )

• Then, if we divide by the standard deviation we can see that:

(β1hat - β1 ) / β1

hat ~ N( 0 , 1 )

• To test the “Null Hypothesis that β1 =0, we

can see that: β1hat / σβ ~ N( 0 , 1 )

Page 9: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

9

Z-Scores & Hypothesis Tests

• This variable is a “z-score” based on the standard normal distribution.

• 95% of cases are within 1.96 standard deviations of the mean.

• If β1hat / σβ > 1.96 then in a series of random

draws there is a 95% chance that β1 >0

• The key problem is that we don’t actually know σβ , the true population parameter.

Page 10: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

10

Z-Scores and t-scores

• Obvious solution is to substitute

in place of σβ

• Problem: β1hat / is the ratio of two random

variables, and this will not be normally distributed

• Fortunately, an employee of Guinness Brewery figured out this distribution in 1919

ˆ1

ˆ1

Page 11: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

11

The t-statistic

• The statistic is called “Student’s t,” and the t-distribution looks similar to a normal distribution

• Thus β1hat / ~ t(n-2) for bivariate regression.

• More generally β1hat / ~ t(n-k)

– where k is the # of parameters estimated

ˆ1

ˆ1

Page 12: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

12

The t-statistic

• Note the addition of a “degrees of freedom” constraint

• Thus the more data points we have relative to the number of parameters we are trying to estimate, the more the t distribution looks like the z distribution.

• When n>100 the difference is negligible

Page 13: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

13

Limited Information in Statistical Significance Tests

• Results often illustrative rather than precise

• Only tests “not zero” hypothesis – does not measure the importance of the variable (look at confidence interval)

• Generally reflects confidence that results are robust across multiple samples

Page 14: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

T-distribution:T-distribution:

The Statistical WorkhorseThe Statistical Workhorse

0

3-3

Let have a chi-square distribution

with degrees of freedom.

ZT=

/

(0, /( 2))

n

n

t n n

df=2

df=4

df=6

As the degrees of

freedom increase, the

t-distribution approaches

the normal distribution.

14

Page 15: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

Quick Review: Hypothesis Testing

• In STATA, the null hypothesis for a two-tailed t-test is: H0: βj=0

15

Page 16: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

Quick Review: Hypothesis Testing

• To test the hypothesis, I need to have a rejection rule. That is, I will reject the null hypothesis if, t is greater than some critical value (c) of the t distribution. You may know this in excel lingo as tcrit.

c is up to me to some extent, I must determine what level of significance I am willing to accept. For instance, if my t-value is 1.85 with 40 df and I was willing to reject only at the 5% level, my c would equal 2.021 and I would not reject the null. On the other hand, if I was willing to reject at the 10% level, my c would be 1.684, and I would reject the null hypotheses.

| |t c

16

Page 17: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

t-distribution:t-distribution:

5 % rejection rule for the that H5 % rejection rule for the that H0: 0:

ββjj=0 with 25 degrees of freedom=0 with 25 degrees of freedom

Rejection Region

Area=.025

Rejection Region

Area=.025

0

Looking at table G-2, I find the

critical value for a two-tailed test is

2.06

2.06-2.06

17

Page 18: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

Quick Review:

• But this operation hides some very useful information.

• STATA has decided that it is more useful to provide what is the smallest level of significance at which the null hypothesis would be rejected. This is known as the p-value.

• In the previous example, we know that .05<p<.10.

• To calculate the p, STATA computes the area under the probability density function.

18

Page 19: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

T-distribution:Obtaining the p-value against a two-sided

alternative, when t=1.85 and df=40.

P-value=P(|T|>t)

In this case, P(|T|>1.85)=

2P(T>1.85)=2(.0359)

=.0718

Area=.9282

Rejection Region

Area=.0359

Rejection Region

Area=.0359

019

Page 20: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

20

For Example… Presidential Approval and the CPI

• reg approval cpi•  • Source | SS df MS Number of obs = 148• ---------+------------------------------ F( 1, 146) = 9.76• Model | 1719.69082 1 1719.69082 Prob > F = 0.0022• Residual | 25731.4061 146 176.242507 R-squared = 0.0626• ---------+------------------------------ Adj R-squared = 0.0562• Total | 27451.0969 147 186.742156 Root MSE = 13.276•  • ------------------------------------------------------------------------------• approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]• ---------+--------------------------------------------------------------------• cpi | -.1348399 .0431667 -3.124 0.002 -.2201522 -.0495277• _cons | 60.95396 2.283144 26.697 0.000 56.44168 65.46624• ------------------------------------------------------------------------------•  • . sum cpi•  • Variable | Obs Mean Std. Dev. Min Max• ---------+-----------------------------------------------------• cpi | 148 46.45878 25.36577 23.5 109

Page 21: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

21

So the distribution of β1hat is:

Fra

ctio

n

Simd cpi parameter-.3 -.2 -.135 -.1 -.05 0 .1

0

.3

Page 22: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

22

Now Lets Look at Approval and the Unemployment Rate

• . reg approval unemrate•  • Source | SS df MS Number of obs = 148• ---------+------------------------------ F( 1, 146) = 0.85• Model | 159.716707 1 159.716707 Prob > F = 0.3568• Residual | 27291.3802 146 186.927262 R-squared = 0.0058• ---------+------------------------------ Adj R-squared = -0.0010• Total | 27451.0969 147 186.742156 Root MSE = 13.672•  • ------------------------------------------------------------------------------• approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]• ---------+--------------------------------------------------------------------• unemrate | -.5973806 .6462674 -0.924 0.357 -1.874628 .6798672• _cons | 58.05901 3.814606 15.220 0.000 50.52003 65.59799• ------------------------------------------------------------------------------•  • . sum unemrate•  • Variable | Obs Mean Std. Dev. Min Max• ---------+-----------------------------------------------------• unemrate | 148 5.640541 1.744879 2.6 10.7

Page 23: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

23

Now the Distribution of β1hat is:

Fra

ctio

n

Simd unemrate parameter-3 -2 -1 -.597 0 .67 1 2 3

0

.25

Page 24: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

Quadratic Review

24

Page 25: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

25

Source SS df MS Number of obs = 177

Model 5631329.68 2 2815664.84 F( 2, 174) = 21.15

Residual 23159327.2 174 133099.582 Prob > F = 0.0000

Total 28790656.9 176 163583.278 R-squared = 0.1956

Adj R-squared = 0.1863

Root MSE = 364.83

profits Coef. Std. Err. t P>t [95% Conf. Interval]

lsalary -1555.042 634.4506 -2.45 0.015 -2807.252 -302.8324

lsalary_sq 138.9452 48.31807 2.88 0.005 43.5802 234.3101

_cons 4372.655 2076.122 2.11 0.037 275.0304 8470.279

Page 26: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

26

Quadratic Review

20 1 1 2 1

ˆ ˆ ˆˆ1. y x x 0

1

y

x1

1 2ˆ ˆ ˆ2. 2dy xdx

* 1

2

ˆ3. , when dy/dx=0

ˆ2x

0 1

Page 27: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

27

Quadratic Review

1. β0hat is the intercept as in the linear equation

2. β1hat is the slope when x is 0 to the first unit of x.

3. β2hat is used to calculate the slope at other points on

the line.a) A positive coefficient on β2

hat means the curve turns upward.

b) A negative coefficient on β2hat means the curve

turns downward4. Use equation 1 to get predicted value for each point

on the line.5. Use equation 2 to get the slope for each point on the

curve.6. Use equation 3 to isolate the point where the slope

is equal to 0

Page 28: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

28

25

30

35

40

45

Gin

i C

oeff

icie

nt

4 6 8 10Natural Log of GDP Per Capita

Simulated Kuznets Curve

Estimated slope based on Kuznets Curve for Vietnam

Estimated slope based on Kuznets Curve for China

Actual slope for Vietnam (1993-2004)

Actual slope for China (1993-2004)

Kuznets simulations based on Higgins & Williamson 2002:284 regression parameters for 1990s

Figure 1: Kuznets Predictions and Actual Relationship between

Growth and Inequality

Page 29: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

Dealing with a Complicated World

• Multiple regression to address multiple causes

29

Page 30: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

30

Multiple Regression:What if y has more than just ONE cause?

• We have found an estimator for the relationship between x and y

• We have developed methods to use the estimator to test hypotheses derived from theories about x and y.

• But we have only 1 x (and only 1 β)

• The world is more complicated than that!

Page 31: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

31

Multiple Regression Analysis

• We can make a simple extension of the bivariate model to the multivariate case

• Instead of a two dimensional space (x and y axes) we move into multi-dimensional space

• If we have x1 and x2, then we are fitting a two dimensional plane through points in space.

Page 32: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

32

The Bivariate Regression

x1: Expenditures of Candidate A in $1000s (2000-2003)

y:

Vote

s f

or

Can

did

ate

A in

2004

Page 33: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

33

Now, we add another variable (x2)

y: Votes for

Candidate A in 2004

x1: Expenditures of

Candidate A in

$1000s (2000-2003)

x2:

10

0s o

f n

ew

jo

bs c

reate

d

(20

00

-20

03

)

Page 34: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

34

Because multiple linear regression includes more than a single independent variable, the result of an analysis is best visualized as a plane rather than as the line of a bivariate regression analysis.

This plane is defined by a series of slopes and a y-intercept value, and oriented such that deviations between the observed data points and the plane are minimized in the direction of the dependent variable.

Explanation of Multivariate Analysis

Page 35: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

35

Two Dimensional Plane in 3D Space

0 1 1 2 2ˆ ˆ ˆy x x

0

y: Votes for

Candidate A in 2004

x1: Expenditures of

Candidate A in

$1000s (2000-2003)

x2:

10

0s o

f n

ew

job

s

cre

ate

d (

20

00

-20

03

)

1

2

Page 36: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

36

Interpretation of Multiple Regression

0 1ˆ ˆˆ ˆ (votes)y Expenditure u

0 1 2ˆ ˆ ˆˆ ˆ (votes)y Expenditure JobGrowth u

Q1. If we modeled only an equation with expenditures, where would the impact of Job Growth show up in our results?

A1. Correct. It would show up in a larger residual size.

Q2. How do I interpret the coefficient β1

hat in my STATA output?

A2. β1hat is the ceteris paribus effect

of expenditures on vote changes, controlling for job growth.

Q3. What does “controlling for” mean?

A3. It means that the effect of Job Growth is held fixed or constant. ▲Job Growth = 0

1

2

ˆˆ (votes) * , when 0

ˆˆ (votes) * , when 0

y Expenditure JobGrowth

y JobGrowth Expenditure

Page 37: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

37

Another way to think of Partial Effects

0 1 2

Beginning with this equation:

ˆ ˆ ˆˆ ˆ (votes)

Now, I want to calculate the impact of Expenditures

on Votes after the impact of Job Growth has been partialed out.

One way to

y Expenditure JobGrowth u

1 2

1 0 2 2 1

1 1

2

1 1 11

do this, is to try to predict changes in x (Expenditures) with x (Job Growth).

ˆ ˆˆ ˆx x

ˆ ˆ,where r is the portion of variation in x not explained

by changes in x Then,

ˆ ˆ ˆ( ) /(n

ii

r

r y

21

1

)n

ii

r

Basically, β1hat measures

the sample relation between x1 and y, after x2

has been partialled out.

In other words, I regress x1 on x2.

Page 38: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

38

Illustrating this approach to β1hat

• Coefficient β1hat is

calculated based on area in yellow circle that overlaps with blue, but NOT with red.

1

y

x1

x2

2

Covariance between x1

and x2

Center area discarded – We can’t say which variable accounts

for it

Page 39: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

39

Relationship between Multiple Regression and Bivariate World.

0 1

0 1

Let's start with a simple bivariate equation, where I denote

the predicted values using ( ):

(votes)

If I add a new variable in multiple regression,

ˆ ˆˆ (votes)

y Expenditure u

y Expendit

2

1 1

ˆ ˆ

ˆwhat is the relationship between & ?

ure JobGrowth u

1 1 2 1

1 i2 i1

ˆ ˆ ,

where is the slope coefficient of a regressing x on x

Page 40: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

40

An Example from HW1 (Progress) 67.96945 -9.844708 (distance)

ˆ ˆ (Progress) 70.00648 -7.78955 (distance) -.3064532 postcomseats

postcomseats=6.647114+ 6.706271(distance)

(distance)=-7.78955+-.3064532(6.7062

y u

y u

71)

display -7.78955+-.3064532*( 6.706271)

-9.844708

Page 41: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

41

Change in Gauss-Markov Assumption 4 – Zero Conditional Mean

• Zero conditional mean, before we summarized GM4 as “the population error (u) has an expected value of 0 for any value of the explanatory variable (x).”

Essentially, this meant that other factors having a direct impact on y (i.e. changes in votes) are unrelated on average to x (expenditures).

1( ) 0E u x

21( , ) 0E u x x

• Now, GM4 becomes “The population error (u) has an expected value of 0 for any combination of x1 & x2

Other factors having a direct impact on y (i.e. changes in votes) are unrelated on average to x1 (expenditures) and x2 (job growth).

The equation also implies that we have correctly

specified the functional form between the independent and dependent variables!

Page 42: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

42

Change in Gauss-Markov Assumption 3 – No Perfect Collinearity

1 2 3( ... )nx x x x x • Before, GM3 was that there must be sample variation in explanatory variables. xi’s are not all the same value

•Now GM3 reads, none of the independent variables is constant and there are no exact linear relationships among the dependent variables.

Essentially, if any one of our x’s is perfectly explained by the others, it will drop out of

our model.

Page 43: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

43

Multiple Regression Analysis

• Above 3 dimensions MR becomes difficult to visualize.• Logic of the process is the same. We are fitting SETS of

x’s to each point on a y dimension.• β0

hat remains the intercept and β1

hat, β2hat

.. are called slope estimates.

• Though in a quadratic function, slope estimates for both coefficients is slightly incorrect. Why?

• The basic equation of the true population model in scalar notation is:

0 1 1 2 2 3 3...i i i i k ik iy x x x x u

Page 44: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

44

And the Slope Coefficients?

• In scalar terms the equation for βhat of variable k becomes:

• where = the linear prediction of xik based on the other x’s

• Similar to the bivariate estimator. Then we used because we lacked any better expectation about x

2

ˆ( )ˆˆ( )

ik ik ik

ik ik

x x y

x x

ˆikx

x

Please note that Wooldridge indexes

observations by “t” instead of “i” in the matrix algebra discussions. We use “i” to maintain the analogy with

scalar algebra.

Page 45: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

45

Shifting to Matrix Notation

• Writing out these terms and multiplying them in scalar notation is clumsy.

• Represented in simpler terms through linear (matrix) algebra

• The basic equation becomes:

ˆ ˆy X u

Page 46: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

46

The Multiple Regression Equation

• The vectors and matrices in are represented by

• Note that we post-multiply X by β since this order makes them conformable.

ˆ ˆy X u

11 11 12 1

2 21 22 2 22

1 2

ˆ ˆ...ˆ ˆ... ˆ ˆ

... ... ... ...

ˆ... ˆ

ik

k

n n n nk nk

y x x x u

y x x x uy X u

y x x x u

Page 47: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

47

Math Tools With Matrices

• To derive our vector of coefficients βhat, we will need to do some math with matrices

• Multiplying matrices

• Taking the transpose of a matrix

• Inverting a matrix

Page 48: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

48

We Can Multiply Matrices

• Multiplication

• Where

3231

2221

1211

3231

2221

1211

333231

232221

131211

cc

cc

cc

bb

bb

bb

x

aaa

aaa

aaa

32332232123132

31332132113131

32232222122122

31232122112121

32132212121112

31132112111111

bababac

bababac

bababac

bababac

bababac

bababac

Page 49: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

49

The Transpose of a Matrix A'

• Taking the transpose is an operation that creates a new matrix based on an existing one.

• The rows of A = the columns of A'

• Hold upper left and lower right corners and rotate 180 degrees.

Page 50: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

50

Example of a transpose

654

321

63

52

41

', AA

Page 51: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

51

A Popular Matrix (Squared residuals): u‘u

1

2 21 2

1

ˆ

ˆˆ ˆ ˆ ˆ ˆ ˆ

ˆ

n

n ii

n

u

uu u u u

u

u'u

1

2

ˆ

ˆˆ

ˆn

u

uu

u

Page 52: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

52

Minimizing the Sum of Squares

• Thus IS the sum of squared residuals (SSR)

• In matrix terminology, we want to pick the vector of βhats’s that minimizes

• As in scalar notation, the vector of βhats’s is a function of the X matrix and the y vector

ˆ ˆu'u

ˆ ˆu'u

Page 53: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

53

Another popular matrix: X'X

11

21

1

1

1

1 n

x

x

x

X

111

121

11 21 1 21 1

1 11

1

1 1 1 1

1

n

ii

n nn

i ii in

xn x

x

x x xx x

x

X'X

Page 54: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

54

The Inverse of a Matrix (A-1)

• For an nxn matrix A, there may be a B such that AB = I = BA.

• The inverse is analogous to a reciprocal• A matrix which has an inverse is called

“nonsingular”.• A matrix which does not have an inverse is

“singular”.• An inverse exists only if the determinant of A does

not equal 0. • This is true only if columns of the matrix are not

linearly dependent (i.e. 2*column1=colum2).

0A

Page 55: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

55

How to find inverse matrixes? determinants? and more?

If and |A| 0

ac

bd-

)det(

11

AA

dc

baA

det( ) * *A A a d b c

Page 56: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

56

Deriving βhat for Multiple Regression

• Begin again with minimizing squared errors

• Take the partial derivative with respect to each element of β hat

2

1

ˆ ˆˆ ˆˆ

ˆ ˆ ˆ ˆ ˆ

n

ii

u

(y X )'(y X )

(y X )'(y X ) y'y 2 'X'y 'X'X

u'u

Page 57: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

57

Deriving βhat for Multiple Regression

• Setting the derivative equal to 0 for minimization yields:

• Notice that this statement implies k separate equations with k unknowns– Where k=# of parameters in the β-hat vector

ˆ ˆ ˆ 0ˆ

2X'y 2X'X

u'u

Page 58: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

58

Deriving βhat for Multiple Regression

• Rearranging terms, this becomes:

• We want to divide both sides by X’X, which means multiplying by its inverse (just like with scalar fractions)

• Thus, we must take the inverse of X’X

X'X X'y

Page 59: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

59

Deriving βhat for Multiple Regression

• (X’X)-1 will exist only if X’X is non-singular (is of full rank)

• That is, none of the x’s can be a perfect linear function of the other x’s

• If this is true, then:

1ˆ ' (X'X) X yEssentially, the same as scalar notation.

The covariance of the X matrix and the y

vector over the variance-covariance

matrix of X.

Page 60: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

60

The Meaning of βhat in Multiple Regression

• Each element of the vector βhat is a slope coefficient for one of the x’s

• Same as in bivariate context except that βhat1

is the expected change in y for a 1 unit increase in x1, while holding x2…xn constant

• Thus βhat1 represents the direct effect of x1 on

y, controlling for x2…xn

Page 61: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

61

How does βhat Do That?

• Calculation of each element of the βhat vector includes information from observations on all the independent variables and the dependent variable

• The computation includes information about covariation among the x’s

• It also includes information about the relationship between other x’s and y

Page 62: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

62

2

ˆ( )ˆˆ( )

ik ik ik

ik ik

x x y

x x

1ˆ ' (X'X) X y=

Scalar and Matrix Notation

Page 63: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

63

Let’s Do An Example

1. A Matrix of Independent Variables: X and a Vector Dependent Variable y

• X[5,3]• x1 x2 x3• r1 1 -6.5837102 -12.336432• r2 1 -17.004963 3.0143378• r3 1 -1.9127336 -15.459048• r4 1 12.721842 1.3890865• r5 1 .54984921 11.332677

• y[5,1]• Y• r1 -2.9141634• r2 -53.135086• r3 12.545163• r4 38.019218• r5 -3.9364302

Page 64: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

64

Let’s See Perfect Colinearity

• Note that x4=2*x3• X[5,4]• x1 x2 x3 x4• r1 1 -6.5837102 -12.336432 -24.672865• r2 1 -17.004963 3.0143378 6.0286756• r3 1 -1.9127336 -15.459048 -30.918097• r4 1 12.721842 1.3890865 2.778173• r5 1 .54984921 11.332677 22.665354

• . matrix XtX = X'*X

• . display det(XtX)• 0

• Determinant of X’X = 0• Thus X’X is not of full rank

Page 65: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

65

Regressing Y on X2 & X3• Create the TRANSPOSE of X• . matrix Xt = X'

• Xt[3,5]• r1 r2 r3 r4 r5• X1 1 1 1 1 1• X2 -6.5837102 -17.004963 -1.9127336 12.721842 .54984921• X3 -12.336432 3.0143378 -15.459048 1.3890865 11.332677

• Create X'X Matrix - Variation & Covariation of X's

• . matrix XtX = X'*X

• symmetric XtX[3,3]• X1 X2 X3• X1 5• X2 -12.229716 498.32015• X3 -12.05938 83.432836 530.6151

Page 66: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

66

Regressing Y on X2 and X3• Note Matrix is of FULL RANK - CAN BE INVERTED

• . display det(XtX)• 1160053.6

• . matrix B = inv(X'*X)*X'*y

• B[3,1]• Y• X1 3.1000365• X2 3.0111852• X3 -.98715343

Page 67: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

67

Now Do It With STATA• . reg Y X2 X3

• Source | SS df MS Number of obs = 5• ---------+------------------------------ F( 2, 2) = 256.94• Model | 4415.23139 2 2207.61569 Prob > F = 0.0039• Residual | 17.1837301 2 8.59186507 R-squared = 0.9961• ---------+------------------------------ Adj R-squared = 0.9922• Total | 4432.41512 4 1108.10378 Root MSE = 2.9312

• ------------------------------------------------------------------------------• Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]• ---------+--------------------------------------------------------------------• X2 | 3.011185 .1362818 22.095 0.002 2.424812 3.597558• X3 | -.9871534 .1317047 -7.495 0.017 -1.553833 -.4204737• _cons | 3.100036 1.380879 2.245 0.154 -2.841404 9.041477• ------------------------------------------------------------------------------

Page 68: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

68

And so, calculations grind to a halt...

• . matrix b = inv(X'*X)*X'*y• matrix has missing values• r(504);

• . reg Y X2 X3 X4• Source | SS df MS Number of obs = 5• ---------+------------------------------ F( 2, 2) = 256.94• Model | 4415.23139 2 2207.61569 Prob > F = 0.0039• Residual | 17.1837301 2 8.59186507 R-squared = 0.9961• ---------+------------------------------ Adj R-squared = 0.9922• Total | 4432.41512 4 1108.10378 Root MSE = 2.9312

• ------------------------------------------------------------------------------• Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]• ---------+--------------------------------------------------------------------• X2 | 3.011185 .1362818 22.095 0.002 2.424812 3.597558• X3 | (dropped)• X4 | -.4935767 .0658524 -7.495 0.017 -.7769166 -.2102369• _cons | 3.100036 1.380879 2.245 0.154 -2.841404 9.041477• ------------------------------------------------------------------------------

Page 69: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

69

Multivariate Proof of Gauss Markov Theorem with Matrix Notation

Optional!

Page 70: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

70

Imagine Another Estimator of the Slope Vector β

• Estimator β# is a function of y and some set of weights C

• Set of weights C can be rewritten as our OLS estimator plus some matrix of weights D

# 1[( ' ) ' ]Cy X X X D y

Note, this is also a linear estimator of β

Set of Weights

Page 71: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

71

Calculating the Expectation of β#

• Take expectations of both sides and substitute Xβ+u for y, as this is the population definition of y.

• Separate out (X’X)-1X’ and D terms to yield:

# 1 1( ) [( ' ) ' ( ' ) ' ]E E X X X X X X X u DX Du

# 1( )E E[(X'X) X' D][X u]

Page 72: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

72

Calculating the Expectation of β#

• Take expectation of each term

• Recall that (X’X)-1X’X β =(1)* β= β, and E(u)=0

• Pull out the identity matrix (I), in order to separate β

# 1 1( ) [( ' ) ' ] [( ' ) ' ] [ ] [ ]E E X X X X E X X X u E DX E Du

# 1( ) [ ] ( ' ) ' [ ] [ ] [ ]E E X X X E u E DX DE u

#( ) ( )E DX I DX

Page 73: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

73

Recall β# Must be Unbiased

• Unbiased: Since DX β is non-stochastic, then if β# is Unbiased, meaning E(β#) = β , then DX must be equal to 0. In other words, the weights must equal 0 for β# to be unbiased.

• Efficiency: If DX=0, then we can write β# as:#( ) ( )E DX I DX

# 1

1 1

1

[( ' ) ' ][ ]

[( ' ) ' )] [( ' ) ' )]

( ' ) '

X X X D X u

X X X X X X X u DX Du

X X X u Du

F.O.I.L

=(1)*β

Page 74: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

74

Determining the Variance of β#

• Now, subtract β to calculate the distance of β# from β as:

• To get variance, we square the errors (the right side of the equation above

# 1( ) ( ' ) 'X X X u Du

# # 1 1[( )( ) '] [( ' ) ' ] '[ ' ( ' ) ]E E X X X D uu D X X X

Page 75: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

75

Determining the Variance of β#

• Since E(uu’)=σ2I

• Recall that DX=0 and therefore X’D’=0

# # 2 1 1[( )( ) '] [( ' ) ' ][ ' ( ' ) ]E X X X D D X X X σ

]')'[( 12 DDXX σ

2 1 1 1[( ' ) ' ' ( ' ) ( ' ) ']X X X D DX X X X X DD σ

F.O.I.L

Page 76: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

76

Variance of β <Variance of β#

• First term of this equation is the variance of β, plus the squared weights = β#

• We are concerned with variance – the diagonal elements of the matrix

• Diagonal elements of DD’ must be positive – they are sums of squares

2 2 1[( ' ) ']# X X DD σ σ

Page 77: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

77

Variance of β <Variance of β#

• If diagonal elements of DD’ are positive, then the variance of β#>variance of β

• Unless D=0, but then β#= β

2 2 1[( ' ) ']# X X DD σ σ

Page 78: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

78

OLS is BLUE!

• Thus among the class of linear estimators– β#= Cy

• That are also unbiased linear estimators– E(β#)= β

• OLS has the least variance• Thus OLS is the Best Linear Unbiased Estimator

– Its BLUE!

Page 79: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

79

Building on the Foundations of the OLS Estimator

• We needed a number of assumptions to justify the claim that OLS is BLUE

• Next time, we will review the assumptions underlying our use of OLS

Page 80: 1 Lecture 7 Multiple Regression & Matrix Notation Quantitative Methods 2 Edmund Malesky, Ph.D., UCSD.

80

Building on the Foundations of the OLS Estimator

• Then we will discuss how to assess the performance of regression models

• Then we begin addressing the violation of our central assumptions and how to repair our estimator of the vector βhat