Multiple Regression Analysis Multiple Regression Model Sections 16.1 - 16.6.
Gov 2000: 10. Multiple Regression in Matrix Form...Whymatricesandvectors? โข Hereโs one way to...
Transcript of Gov 2000: 10. Multiple Regression in Matrix Form...Whymatricesandvectors? โข Hereโs one way to...
Gov 2000: 10. MultipleRegression in Matrix Form
Matthew BlackwellFall 2016
1 / 64
1. Matrix algebra review
2. Matrix Operations
3. Linear model in matrix form
4. OLS in matrix form
5. OLS inference in matrix form
2 / 64
Where are we? Where are wegoing?
โข Last few weeks: regression estimation and inference with oneand two independent variables, varying effects
โข This week: the general regression model with arbitrarycovariates
โข Next week: what happens when assumptions are wrong
3 / 64
Nunn & Wantchekon
โข Are there long-term, persistent effects of slave trade onAfricans today?
โข Basic idea: compare levels of interpersonal trust (๐๐) acrossdifferent levels of historical slave exports for a respondentโsethnic group
โข Problem: ethnic groups and respondents might differ in theirinterpersonal trust in ways that correlate with the severity ofslave exports
โข One solution: try to control for relevant differences betweengroups via multiple regression
4 / 64
Nunn & Wantchekon
โข Whaaaaa? Bold letter, quotation marks, what is this?โข Todayโs goal is to decipher this type of writing
5 / 64
Multiple Regression in Rnunn <- foreign::read.dta("../data/Nunn_Wantchekon_AER_2011.dta")mod <- lm(trust_neighbors ~ exports + age + male + urban_dum
+ malaria_ecology, data = nunn)summary(mod)
#### Coefficients:## Estimate Std. Error t value Pr(>|t|)## (Intercept) 1.5030370 0.0218325 68.84 <2e-16 ***## exports -0.0010208 0.0000409 -24.94 <2e-16 ***## age 0.0050447 0.0004724 10.68 <2e-16 ***## male 0.0278369 0.0138163 2.01 0.044 *## urban_dum -0.2738719 0.0143549 -19.08 <2e-16 ***## malaria_ecology 0.0194106 0.0008712 22.28 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.978 on 20319 degrees of freedom## (1497 observations deleted due to missingness)## Multiple R-squared: 0.0604, Adjusted R-squared: 0.0602## F-statistic: 261 on 5 and 20319 DF, p-value: <2e-16
6 / 64
Why matrices and vectors?
7 / 64
8 / 64
Why matrices and vectors?
โข Hereโs one way to write the full multiple regression model:
๐ฆ๐ = ๐ฝ0 + ๐ฅ๐1๐ฝ1 + ๐ฅ๐2๐ฝ2 + โฏ + ๐ฅ๐๐๐ฝ๐ + ๐ข๐
โข Notation is going to get needlessly messy as we add variables.โข Matrices are clean, but they are like a foreign language.โข You need to build intuitions over a long period of time.
9 / 64
Quick note about interpretation
๐ฆ๐ = ๐ฝ0 + ๐ฅ๐1๐ฝ1 + ๐ฅ๐2๐ฝ2 + โฏ + ๐ฅ๐๐๐ฝ๐ + ๐ข๐
โข In this model, ๐ฝ1 is the effect of a one-unit change in ๐ฅ๐1conditional on all other ๐ฅ๐๐.
โข Jargon โpartial effect,โ โceteris paribus,โ โall else equal,โโconditional on the covariates,โ etc
โข Notation change: lower-case letters here are random variables.
10 / 64
1/ Matrix algebrareview
11 / 64
Vectors
โข A vector is just list of numbers (or random variables).โข A 1 ร ๐ row vector has these numbers arranged in a row:
๐ = [ ๐1 ๐2 ๐3 โฏ ๐๐ ]
โข A ๐ ร 1 column vector arranges the numbers in a column:
๐ =โกโขโขโขโฃ
๐1๐2โฎ
๐๐
โคโฅโฅโฅโฆ
โข Convention weโll assume that a vector is column vector andvectors will be written with lowercase bold lettering (๐)
12 / 64
Vector examples
โข Vector of all covariates for a particular unit ๐:
๐ฑ๐ =โกโขโขโขโขโขโฃ
1๐ฅ๐1๐ฅ๐2โฎ
๐ฅ๐๐
โคโฅโฅโฅโฅโฅโฆ
โข For the Nunn-Wantchekon data, we might have:
๐ฑ๐ =โกโขโขโขโฃ
1exports๐
age๐male๐
โคโฅโฅโฅโฆ
13 / 64
Matrices
โข A matrix is just a rectangular array of numbers.โข We say that a matrix is ๐ ร ๐ (โ๐ by ๐โ) if it has ๐ rows and ๐
columns.โข Uppercase bold denotes a matrix:
๐ =โกโขโขโขโฃ
๐11 ๐12 โฏ ๐1๐๐21 ๐22 โฏ ๐2๐โฎ โฎ โฑ โฎ
๐๐1 ๐๐2 โฏ ๐๐๐
โคโฅโฅโฅโฆ
โข Generic entry: ๐๐๐ where this is the entry in row ๐ and column ๐
14 / 64
Examples of matrices
โข One example of a matrix that weโll use a lot is the designmatrix, which has a column of ones, and then each of thesubsequent columns is each independent variable in theregression.
๐ =โกโขโขโขโฃ
1 exports1 age1 male11 exports2 age2 male2โฎ โฎ โฎ โฎ1 exports๐ age๐ male๐
โคโฅโฅโฅโฆ
15 / 64
Design matrix in R
head(model.matrix(mod), 8)
## (Intercept) exports age male urban_dum malaria_ecology## 1 1 855 40 0 0 28.15## 2 1 855 25 1 0 28.15## 3 1 855 38 1 1 28.15## 4 1 855 37 0 1 28.15## 5 1 855 31 1 0 28.15## 6 1 855 45 0 0 28.15## 7 1 855 20 1 0 28.15## 8 1 855 31 0 0 28.15
dim(model.matrix(mod))
## [1] 20325 6
16 / 64
2/ MatrixOperations
17 / 64
Transpose
โข The transpose of a matrix ๐ is the matrix created byswitching the rows and columns of the data and is denoted ๐โฒ.
โข ๐th column of ๐ becomes the ๐th row of ๐โฒ:
๐ = โกโขโขโฃ
๐11 ๐12๐21 ๐22๐31 ๐32
โคโฅโฅโฆ
๐โฒ = [ ๐11 ๐21 ๐31๐12 ๐22 ๐32
]
โข If ๐ is ๐ ร ๐, then ๐โฒ will be ๐ ร ๐.โข Also written ๐๐
18 / 64
Transposing vectors
โข Transposing will turn a ๐ ร 1 column vector into a 1 ร ๐ rowvector and vice versa:
๐ฑ๐ =โกโขโขโขโขโขโฃ
1๐ฅ๐1๐ฅ๐2โฎ
๐ฅ๐๐
โคโฅโฅโฅโฅโฅโฆ
๐ฑโฒ๐ = [ 1 ๐ฅ๐1 ๐ฅ๐2 โฏ ๐ฅ๐๐ ]
19 / 64
Transposing in R
a <- matrix(1:6, ncol = 3, nrow = 2)a
## [,1] [,2] [,3]## [1,] 1 3 5## [2,] 2 4 6
t(a)
## [,1] [,2]## [1,] 1 2## [2,] 3 4## [3,] 5 6
20 / 64
Write matrices as vectorsโข A matrix is just a collection of vectors (row or column)โข As a row vector:
๐ = [ ๐11 ๐12 ๐13๐21 ๐22 ๐23
] = [ ๐โฒ1
๐โฒ2
]
with row vectors๐โฒ
1 = [ ๐11 ๐12 ๐13 ] ๐โฒ2 = [ ๐21 ๐22 ๐23 ]
โข Or we can define it in terms of column vectors:
๐ = โกโขโขโฃ
๐11 ๐12๐21 ๐22๐31 ๐32
โคโฅโฅโฆ
= [ ๐๐ ๐๐ ]
where ๐๐ and ๐๐ represent the columns of ๐.โข ๐ subscripts columns of a matrix: ๐ฑ๐โข ๐ and ๐ก will be used for rows ๐ฑโฒ
๐ .21 / 64
Design matrix
โข Design matrix as a series of row vectors:
๐ =โกโขโขโขโฃ
1 exports1 age1 male11 exports2 age2 male2โฎ โฎ โฎ โฎ1 exports๐ age๐ male๐
โคโฅโฅโฅโฆ
=โกโขโขโขโฃ
๐ฑโฒ1
๐ฑโฒ2โฎ
๐ฑโฒ๐
โคโฅโฅโฅโฆ
โข Design matrix as a series of column vectors:
๐ = [ ๐ ๐ฑ1 ๐ฑ2 โฏ ๐ฑ๐ ]
22 / 64
Addition and subtraction
โข How do we add or subtract matrices and vectors?โข First, the matrices/vectors need to be comformable, meaning
that the dimensions have to be the same.โข Let ๐ and ๐ both be 2 ร 2 matrices. Then, let ๐ = ๐ + ๐,
where we add each cell together:
๐ + ๐ = [ ๐11 ๐12๐21 ๐22
] + [ ๐11 ๐12๐21 ๐22
]
= [ ๐11 + ๐11 ๐12 + ๐12๐21 + ๐21 ๐22 + ๐22
]
= [ ๐11 ๐12๐21 ๐22
]
= ๐
23 / 64
Scalar multiplication
โข A scalar is just a single number: you can think of it sort oflike a 1 by 1 matrix.
โข When we multiply a scalar by a matrix, we just multiply eachelement/cell by that scalar:
๐๐ = ๐ [ ๐11 ๐12๐21 ๐22
] = [ ๐ ร ๐11 ๐ ร ๐12๐ ร ๐21 ๐ ร ๐22
]
24 / 64
3/ Linear model inmatrix form
25 / 64
The linear model with newnotation
โข Remember that we wrote the linear model as the following forall ๐ โ {1, โฆ , ๐}:
๐ฆ๐ = ๐ฝ0 + ๐ฅ๐๐ฝ1 + ๐ง๐๐ฝ2 + ๐ข๐
โข Imagine we had an ๐ of 4. We could write out each formula:
๐ฆ1 = ๐ฝ0 + ๐ฅ1๐ฝ1 + ๐ง1๐ฝ2 + ๐ข1 (unit 1)๐ฆ2 = ๐ฝ0 + ๐ฅ2๐ฝ1 + ๐ง2๐ฝ2 + ๐ข2 (unit 2)๐ฆ3 = ๐ฝ0 + ๐ฅ3๐ฝ1 + ๐ง3๐ฝ2 + ๐ข3 (unit 3)๐ฆ4 = ๐ฝ0 + ๐ฅ4๐ฝ1 + ๐ง4๐ฝ2 + ๐ข4 (unit 4)
26 / 64
The linear model with newnotation
๐ฆ1 = ๐ฝ0 + ๐ฅ1๐ฝ1 + ๐ง1๐ฝ2 + ๐ข1 (unit 1)๐ฆ2 = ๐ฝ0 + ๐ฅ2๐ฝ1 + ๐ง2๐ฝ2 + ๐ข2 (unit 2)๐ฆ3 = ๐ฝ0 + ๐ฅ3๐ฝ1 + ๐ง3๐ฝ2 + ๐ข3 (unit 3)๐ฆ4 = ๐ฝ0 + ๐ฅ4๐ฝ1 + ๐ง4๐ฝ2 + ๐ข4 (unit 4)
โข We can write this as:
โกโขโขโขโฃ
๐ฆ1๐ฆ2๐ฆ3๐ฆ4
โคโฅโฅโฅโฆ
=โกโขโขโขโฃ
1111
โคโฅโฅโฅโฆ
๐ฝ0 +โกโขโขโขโฃ
๐ฅ1๐ฅ2๐ฅ3๐ฅ4
โคโฅโฅโฅโฆ
๐ฝ1 +โกโขโขโขโฃ
๐ง1๐ง2๐ง3๐ง4
โคโฅโฅโฅโฆ
๐ฝ2 +โกโขโขโขโฃ
๐ข1๐ข2๐ข3๐ข4
โคโฅโฅโฅโฆ
โข Outcome is a linear combination of the the ๐ฑ, ๐ณ, and ๐ฎ vectors
27 / 64
Grouping things into matrices
โข Can we write this in a more compact form? Yes! Let ๐ and ๐ทbe the following:
๐(4ร3)
=โกโขโขโขโฃ
1 ๐ฅ1 ๐ง11 ๐ฅ2 ๐ง21 ๐ฅ3 ๐ง31 ๐ฅ4 ๐ง4
โคโฅโฅโฅโฆ
๐ท(3ร1)
= โกโขโขโฃ
๐ฝ0๐ฝ1๐ฝ2
โคโฅโฅโฆ
28 / 64
Matrix multiplication by a vector
โข We can write this more compactly as a matrix(post-)multiplied by a vector:
โกโขโขโขโฃ
1111
โคโฅโฅโฅโฆ
๐ฝ0 +โกโขโขโขโฃ
๐ฅ1๐ฅ2๐ฅ3๐ฅ4
โคโฅโฅโฅโฆ
๐ฝ1 +โกโขโขโขโฃ
๐ง1๐ง2๐ง3๐ง4
โคโฅโฅโฅโฆ
๐ฝ2 = ๐๐ท
โข Multiplication of a matrix by a vector is just the linearcombination of the columns of the matrix with the vectorelements as weights/coefficients.
โข And the left-hand side here only uses scalars times vectors,which is easy!
29 / 64
General matrix by vectormultiplication
โข ๐ is a ๐ ร ๐ matrixโข ๐ is a ๐ ร 1 column vectorโข Columns of ๐ have to match rows of ๐โข Let ๐๐ be the ๐th column of ๐ด. Then we can write:
๐(๐ร1)
= ๐๐ = ๐1๐1 + ๐2๐2 + โฏ + ๐๐๐๐
โข ๐ is linear combination of the columns of ๐
30 / 64
Back to regression
โข ๐ is the ๐ ร (๐ + 1) design matrix of independent variablesโข ๐ท be the (๐ + 1) ร 1 column vector of coefficients.โข ๐๐ท will be ๐ ร 1:
๐๐ท = ๐ฝ0 + ๐ฝ1๐ฑ1 + ๐ฝ2๐ฑ2 + โฏ + ๐ฝ๐๐ฑ๐
โข Thus, we can compactly write the linear model as thefollowing:
๐ฒ(๐ร1)
= ๐๐ท(๐ร1)
+ ๐ฎ(๐ร1)
31 / 64
Inner product
โข The inner (or dot) product of a two column vectors ๐ and ๐(of equal dimension, ๐ ร 1):
โจ๐, ๐โฉ = ๐โฒ๐ = ๐1๐1 + ๐2๐2 + โฏ + ๐๐๐๐
โข If ๐โฒ๐ = 0 we say that the two vectors are orthogonal.โข With ๐ = ๐๐, we can write the entries of ๐ as inner products:
๐๐ = ๐โฒ๐๐
โข If ๐ฑโฒ๐ is the ๐th row of ๐, then we write the linear model as:
๐ฆ๐ = ๐ฑโฒ๐๐ท + ๐ข๐
= ๐ฝ0 + ๐ฅ๐1๐ฝ1 + ๐ฅ๐2๐ฝ2 + โฏ + ๐ฅ๐๐๐ฝ๐ + ๐ข๐
32 / 64
4/ OLS in matrixform
33 / 64
Matrix multiplication
โข What if, instead of a column vector ๐, we have a matrix ๐with dimensions ๐ ร ๐.
โข How do we do multiplication like so ๐ = ๐๐?โข Each column of the new matrix is just matrix by vector
multiplication:
๐ = [๐1 ๐2 โฏ ๐๐] ๐๐ = ๐๐๐
โข Thus, each column of ๐ is a linear combination of thecolumns of ๐.
34 / 64
Properties of matrix multiplication
โข Matrix multiplication is not commutative: ๐๐ โ ๐๐โข It is associative and distributive:
๐(๐๐) = (๐๐)๐๐(๐ + ๐) = ๐๐ + ๐๐
โข The transpose: (๐๐)โฒ = ๐โฒ๐โฒ
35 / 64
Square matrices and the diagonal
โข A square matrix has equal numbers of rows and columns.โข The identity matrix, ๐๐ is a ๐ ร ๐ square matrix, with 1s along
the diagonal and 0s everywhere else.
๐3 = โกโขโขโฃ
1 0 00 1 00 0 1
โคโฅโฅโฆ
โข The ๐ ร ๐ identity matrix multiplied by any ๐ ร ๐ matrixreturns the matrix:
๐๐๐ = ๐
36 / 64
Identity matrixโข To get the diagonal of a matrix in R, use the diag() function:
b <- matrix(1:4, nrow = 2, ncol = 2)b
## [,1] [,2]## [1,] 1 3## [2,] 2 4
diag(b)
## [1] 1 4
โข diag() also creates identity matrices in R:diag(3)
## [,1] [,2] [,3]## [1,] 1 0 0## [2,] 0 1 0## [3,] 0 0 1
37 / 64
Multiple linear regression in matrixform
โข Let ๐ท be the matrix of estimated regression coefficients and ๏ฟฝ๏ฟฝbe the vector of fitted values:
๐ท =โกโขโขโขโฃ
๐ฝ0๐ฝ1โฎ
๐ฝ๐
โคโฅโฅโฅโฆ
๏ฟฝ๏ฟฝ = ๐๐ท
โข It might be helpful to see this again more written out:
๏ฟฝ๏ฟฝ =โกโขโขโขโฃ
๐ฆ1๐ฆ2โฎ๐ฆ๐
โคโฅโฅโฅโฆ
= ๐๐ท =โกโขโขโขโฃ
1๐ฝ0 + ๐ฅ11๐ฝ1 + ๐ฅ12๐ฝ2 + โฏ + ๐ฅ1๐๐ฝ๐1๐ฝ0 + ๐ฅ21๐ฝ1 + ๐ฅ22๐ฝ2 + โฏ + ๐ฅ2๐๐ฝ๐
โฎ1๐ฝ0 + ๐ฅ๐1๐ฝ1 + ๐ฅ๐2๐ฝ2 + โฏ + ๐ฅ๐๐๐ฝ๐
โคโฅโฅโฅโฆ
38 / 64
Residuals
โข We can easily write the residuals in matrix form:
๏ฟฝ๏ฟฝ = ๐ฒ โ ๐๐ท
โข The norm or length of a vector generalizes Euclidean distanceand is just the square root of the squared entries,
โ๐โ = โ๐21 + ๐2
2 + โฏ + ๐2๐
โข We can write the norm in terms of inner product: โ๐โ2 = ๐โฒ๐โข Thus we can compactly write the sum of the squared residuals
as:โ๏ฟฝ๏ฟฝโ2 = ๏ฟฝ๏ฟฝโฒ๏ฟฝ๏ฟฝ
=๐
โ๐=1
๐ข2๐
39 / 64
OLS estimator in matrix form
โข OLS still minimizes sum of the squared residuals
arg min๐โโ๐+1
โ๏ฟฝ๏ฟฝโ2 = arg min๐โโ๐+1
โ๐ฒ โ ๐๐โ2
โข Take (matrix) derivatives, set equal to 0โข Resulting first order conditions:
๐โฒ(๐ฒ โ ๐๐ท) = 0
โข Rearranging:๐โฒ๐๐ท = ๐โฒ๐ฒ
โข In order to isolate ๐ท, we need to move the ๐โฒ๐ term to theother side of the equals sign.
โข Weโve learned about matrix multiplication, but what aboutmatrix โdivisionโ?
40 / 64
Scalar inverses
โข What is division in its simplest form? 1๐ is the value such that๐1๐ = 1:
โข For some algebraic expression: ๐๐ข = ๐, letโs solve for ๐ข:
1๐๐๐ข = 1
๐๐
๐ข = ๐๐
โข Need a matrix version of this: 1๐ .
41 / 64
Matrix inverses
โข Definition If it exists, the inverse of square matrix ๐, denoted๐โ1, is the matrix such that ๐โ1๐ = ๐.
โข We can use the inverse to solve (systems of) equations:
๐๐ฎ = ๐๐โ๐๐๐ฎ = ๐โ๐๐
๐๐ฎ = ๐โ๐๐๐ฎ = ๐โ๐๐
โข If the inverse exists, we say that ๐ is invertible or nonsingular.
42 / 64
Back to OLS
โข Letโs assume, for now, that the inverse of ๐โฒ๐ exists (weโllcome back to this)
โข Then we can write the OLS estimator as the following:
๐ท = (๐โฒ๐)โ1๐โฒ๐ฒ
โข Memorize this: โex prime ex inverse ex prime yโ sear it intoyour soul.
43 / 64
Understanding check
โข Suppose ๐ฒ is ๐ ร 1 and ๐ is ๐ ร (๐ + 1).โข What are the dimensions of ๐โฒ๐?โข True/False: ๐โฒ๐ is symmetric.
โถ Note: A square matrix is symmetric if ๐ = ๐โฒ.
โข What are the dimensions of (๐โฒ๐)โ1?โข What are the dimensions of ๐โฒ๐ฒ?โข What are the dimensions of ๐ท?
44 / 64
Implications of OLS
โข We can generalize some mechanical results about OLS.โข The independent variables are orthogonal to the residuals:
๐โฒ๏ฟฝ๏ฟฝ = ๐โฒ(๐ฒ โ ๐๐ท) = 0
โข The fitted values are orthogonal to the residuals:
๏ฟฝ๏ฟฝโฒ๏ฟฝ๏ฟฝ = (๐๐ท)โฒ๏ฟฝ๏ฟฝ = ๐ทโฒ๐โฒ๏ฟฝ๏ฟฝ = 0
45 / 64
OLS by hand in R
๐ท = (๐โฒ๐)โ1๐โฒ๐ฒ
โข First we need to get the design matrix and the response:
X <- model.matrix(trust_neighbors ~ exports + age + male+ urban_dum + malaria_ecology, data = nunn)
dim(X)
## [1] 20325 6
## model.frame always puts the response in the first columny <- model.frame(trust_neighbors ~ exports + age + male
+ urban_dum + malaria_ecology, data = nunn)[,1]length(y)
## [1] 20325
46 / 64
OLS by hand in R
๐ท = (๐โฒ๐)โ1๐โฒ๐ฒ
โข Use the solve() for inverses and %*% for matrixmultiplication:
solve(t(X) %*% X) %*% t(X) %*% y
## (Intercept) exports age male urban_dum## [1,] 1.503 -0.001021 0.005045 0.02784 -0.2739## malaria_ecology## [1,] 0.01941
coef(mod)
## (Intercept) exports age male## 1.503037 -0.001021 0.005045 0.027837## urban_dum malaria_ecology## -0.273872 0.019411
47 / 64
Intuition for the OLS in matrix form
๐ท = (๐โฒ๐)โ1๐โฒ๐ฒ
โข Whatโs the intuition here?โข โNumeratorโ ๐โฒ๐ฒ: is roughly composed of the covariances
between the columns of ๐ and ๐ฒโข โDenominatorโ ๐โฒ๐ is roughly composed of the sample
variances and covariances of variables within ๐โข Thus, we have something like:
๐ท โ (variance of ๐)โ1(covariance of ๐ & ๐ฒ)
โข This is a rough sketch and isnโt strictly true, but it canprovide intuition.
48 / 64
5/ OLS inferencein matrix form
49 / 64
Random vectors
โข A random vector is a vector of random variables:
๐ฑ๐ = [ ๐ฅ๐1๐ฅ๐2
]
โข Here, ๐ฑ๐ is a random vector and ๐ฅ๐1 and ๐ฅ๐2 are randomvariables.
โข When we talk about the distribution of ๐ฑ๐, we are talkingabout the joint distribution of ๐ฅ๐1 and ๐ฅ๐2.
50 / 64
Distribution of random vectors
โข Expectation of random vectors:
๐ผ[๐ฑ๐] = [ ๐ผ[๐ฅ๐1]๐ผ[๐ฅ๐2] ]
โข Variance of random vectors:
๐[๐ฑ๐] = [ ๐[๐ฅ๐1] Cov[๐ฅ๐1, ๐ฅ๐2]Cov[๐ฅ๐1, ๐ฅ๐2] ๐[๐ฅ๐2] ]
โข Properties of this variance-covariance matrix:โถ if ๐ is constant, then ๐[๐โฒ๐ฑ๐] = ๐โฒ๐[๐ฑ๐]๐.โถ if matrix ๐ and vector ๐ are constant, then
๐[๐๐ฑ๐ + ๐] = ๐๐[๐ฑ๐]๐โฒ
51 / 64
Most general OLS assumptions
1. Linearity: ๐ฆ๐ = ๐ฑโฒ๐๐ท + ๐ข๐
2. Random/iid sample: (๐ฆ๐, ๐ฑโฒ๐) are a iid sample from the
population.3. No perfect collinearity: ๐ is an ๐ ร (๐ + 1) matrix with rank
๐ + 14. Zero conditional mean: ๐ผ[๐ข๐|๐ฑ๐] = 05. Homoskedasticity: ๐[๐ข๐|๐ฑ๐] = ๐2๐ข6. Normality: ๐ข๐|๐ฑ๐ โผ ๐(0, ๐2๐ข)
52 / 64
Matrix rankโข Definition The rank of a matrix is the maximum number of
linearly independent columns.โข Definition The columns of a matrix ๐ are linearly
independent if ๐๐ = 0 if and only if ๐ = 0:
๐1๐ฑ1 + ๐2๐ฑ๐ + โฏ + ๐๐๐ฑ๐ = 0
โข Example violation: one column is a linear function of theothers.
โถ 3 covariates with ๐ฑ1 = ๐ฑ2 + ๐ฑ3
0 = ๐1๐ฑ1 + ๐2๐ฑ2 + ๐3๐ฑ3= ๐1(๐ฑ2 + ๐ฑ3) + ๐2๐ฑ2 + ๐3๐ฑ3= (๐1 + ๐2)๐ฑ2 + (๐1 + ๐3)๐ฑ3
โข โฆequals 0 when ๐1 = โ๐2 = โ๐3 โ not linearly independent!53 / 64
Rank and matrix inversion
โข If ๐ is ๐ ร (๐ + 1) has rank ๐ + 1, then all of its columns arelinearly independent
โถ Generalization of no perfect collinearity to arbitrary ๐.
โข ๐ has rank ๐ + 1โ (๐โฒ๐) has rank ๐ + 1โข If a square (๐ + 1) ร (๐ + 1) matrix has rank ๐ + 1, then it is
invertible.โข ๐ has rank ๐ + 1โ (๐โฒ๐)โ1 exists and is unique.
54 / 64
Zero conditional mean error
โข Combining zero mean conditional error and iid we have:
๐ผ[๐ข๐|๐] = ๐ผ[๐ข๐|๐ฑ๐] = 0
โข Stacking these into the vector of errors:
๐ผ[๐ฎ|๐] =โกโขโขโขโฃ
๐ผ[๐ข1|๐]๐ผ[๐ข2|๐]
โฎ๐ผ[๐ข๐|๐]
โคโฅโฅโฅโฆ
=โกโขโขโขโฃ
00โฎ0
โคโฅโฅโฅโฆ
55 / 64
Expectation of OLSโข Useful to write OLS as:
๐ท = (๐โฒ๐)โ1 ๐โฒ๐ฒ= (๐โฒ๐)โ1 ๐โฒ(๐๐ท + ๐ฎ)= (๐โฒ๐)โ1 ๐โฒ๐๐ท + (๐โฒ๐)โ1 ๐โฒ๐ฎ= ๐ท + (๐โฒ๐)โ1 ๐โฒ๐ฎ
โข Under assumptions 1-4, OLS is conditionally unbiased for ๐ท:
๐ผ[๐ท|๐] = ๐ท + (๐โฒ๐)โ1 ๐โฒ๐ผ[๐ฎ|๐]= ๐ท + (๐โฒ๐)โ1 ๐โฒ๐= ๐ท
โข Implies that OLS is unconditionally unbiased: ๐ผ[๐ท] = ๐ท56 / 64
Variance of OLS
โข What about ๐[๐ท|๐]?โข Using some facts about variances and matrices, can derive:
๐[๐ท|๐] = (๐โฒ๐)โ1 ๐โฒ๐[๐ฎ|๐]๐ (๐โฒ๐)โ1
โข What the covariance matrix of the errors, ๐[๐ฎ|๐]?
๐[๐ฎ|๐] =โกโขโขโขโฃ
๐[๐ข1|๐] cov[๐ข1, ๐ข2|๐] โฆ cov[๐ข1, ๐ข๐|๐]cov[๐ข2, ๐ข1|๐] ๐[๐ข2|๐] โฆ cov[๐ข2, ๐ข๐|๐]
โฎ โฑcov[๐ข๐, ๐ข1|๐] cov[๐ข๐, ๐ข2|๐] โฆ ๐[๐ข๐|๐]
โคโฅโฅโฅโฆ
โข This matrix is symmetric since cov(๐ข๐, ๐ข๐) = cov(๐ข๐, ๐ข๐)
57 / 64
Homoskedasicityโข By homoskedasticity and iid, for any units ๐, ๐ , ๐ก:
โถ ๐[๐ข๐ |๐] = ๐[๐ข๐ |๐ฑ๐] = ๐2๐ข (constant variance)โถ cov[๐ข๐ , ๐ข๐ก |๐] = 0 (uncorrelated errors)
โข Then, the covariance matrix of the errors is simply:
๐[๐ฎ|๐] = ๐2๐ข๐๐ =โกโขโขโขโฃ
๐2๐ข 0 0 โฆ 00 ๐2๐ข 0 โฆ 0
โฎ0 0 0 โฆ ๐2๐ข
โคโฅโฅโฅโฆ
โข Thus, we have the following:
๐[๐ท|๐] = (๐โฒ๐)โ1 ๐โฒ๐[๐ฎ|๐]๐ (๐โฒ๐)โ1
= (๐โฒ๐)โ1 ๐โฒ(๐2๐ข๐๐)๐ (๐โฒ๐)โ1
= ๐2๐ข (๐โฒ๐)โ1 ๐โฒ๐ (๐โฒ๐)โ1
= ๐2 (๐โฒ๐)โ1
58 / 64
Sampling variance for OLSestimates
โข Under assumptions 1-5, the sampling variance of the OLSestimator can be written in matrix form as the following:
๐[๐ท|๐] = ๐2๐ข(๐โฒ๐)โ1
โข This symmetric matrix looks like this:
โกโขโขโขโขโฃ
๐ [๐ฝ0|๐] Cov [๐ฝ0, ๐ฝ1|๐] โฏ Cov [๐ฝ0, ๐ฝ๐ |๐]Cov [๐ฝ0, ๐ฝ1|๐] ๐ [๐ฝ1|๐] โฏ Cov [๐ฝ1, ๐ฝ๐ |๐]
โฎ โฎ โฑ โฎCov [๐ฝ0, ๐ฝ๐ |๐] Cov [๐ฝ๐, ๐ฝ1|๐] โฏ ๐ [๐ฝ๐ |๐]
โคโฅโฅโฅโฅโฆ
59 / 64
Inference in the general settingโข Under assumption 1-5 in large samples:
๐ฝ๐ โ ๐ฝ๐se[๐ฝ๐]
โผ ๐(0, 1)
โข In small samples, under assumptions 1-6,๐ฝ๐ โ ๐ฝ๐se[๐ฝ๐]
โผ ๐ก๐โ(๐+1)
โข Thus, under the null of ๐ป0 โถ ๐ฝ๐ = 0, we know that๐ฝ๐
se[๐ฝ๐]โผ ๐ก๐โ(๐+1)
โข Here, the estimated SEs come from:๏ฟฝ๏ฟฝ[๐ท] = ๏ฟฝ๏ฟฝ2๐ข(๐โฒ๐)โ1
๏ฟฝ๏ฟฝ2๐ข = ๏ฟฝ๏ฟฝโฒ๏ฟฝ๏ฟฝ๐ โ (๐ + 1)
60 / 64
Covariance matrix in Rโข We can access this estimated covariance matrix, ๏ฟฝ๏ฟฝ2๐ข(๐โฒ๐)โ1,
in R:
vcov(mod)
## (Intercept) exports age male## (Intercept) 0.0004766593 1.164e-07 -7.956e-06 -6.676e-05## exports 0.0000001164 1.676e-09 -3.659e-10 7.283e-09## age -0.0000079562 -3.659e-10 2.231e-07 -7.765e-07## male -0.0000667572 7.283e-09 -7.765e-07 1.909e-04## urban_dum -0.0000965843 -4.861e-08 7.108e-07 -1.711e-06## malaria_ecology -0.0000069094 -2.124e-08 2.324e-10 -1.017e-07## urban_dum malaria_ecology## (Intercept) -9.658e-05 -6.909e-06## exports -4.861e-08 -2.124e-08## age 7.108e-07 2.324e-10## male -1.711e-06 -1.017e-07## urban_dum 2.061e-04 2.724e-09## malaria_ecology 2.724e-09 7.590e-07
61 / 64
Standard errors from thecovariance matrix
โข Note that the diagonal are the variances. So the square rootof the diagonal is are the standard errors:
sqrt(diag(vcov(mod)))
## (Intercept) exports age male## 0.02183253 0.00004094 0.00047237 0.01381627## urban_dum malaria_ecology## 0.01435491 0.00087123
coef(summary(mod))[, "Std. Error"]
## (Intercept) exports age male## 0.02183253 0.00004094 0.00047237 0.01381627## urban_dum malaria_ecology## 0.01435491 0.00087123
62 / 64
Nunn & Wantchekon
63 / 64
Wrapping up
โข You have the full power of matrices.โข Key to writing the OLS estimator and discussing higher level
concepts in regression and beyond.โข Next week: diagnosing and fixing problems with the linear
model.
64 / 64