econometrics_I definitions , theorems and facts

7/27/2019 econometrics_I definitions , theorems and facts

1/6

1 Useful mathematics, notation and definitions

Fact 1.1. If we multiply a column vector x by a matrix Amxn , this defines a function: Amn : Rn ! Rm

Notation 1. A = [a(1)...a(n)], ai 2 Rm refers to columns 1 through n of matrix A.

Notation 2. A =264

a1...

am1

375, ai 2 Rn refers to the rows of A.Definition 1.1. C(A) = ha(1),...,a(n)i refers to the space spanned by the columns of A.Proposition 1.1. D = AB and B is nonsingular =) rank(D) = rank(A)Proposition 1.2. Let Amn, then, Rank(A) {m, n}Proposition 1.3. D = AB =) Rank(D) min {Rank(A),Rank(B)}Proposition 1.4. Let Amn =) rank(A) = rank(AA0) = rank(A0A).

1.1 Some Vector Calculus

For S() = min

(Y-X)(Y-x) where Yn1, Xnk and k1

Fact 1.2. Derivative w.r.t. a column vector

@S()

@=

0BB@

@S()@1

...@S()@k

1CCA

Fact 1.3. Derivative w.r.t. a row vector

@S()

@0=

@S()@1

, . . . , @S()@k

Fact 1.4. Let() : Rk ! Rm, then() =

0BBB@

1()2()

...

m()

1CCCA is a vector valued mapping, then

@()

@0=

0BB@

@1()@ 0

...@m()@ 0

1CCA

where each element is a row vector as in fact 1.3 and@()@ 0

is a m k matrix1.

Fact 1.5. Let() : Rk ! Rm, () =0BBB@

1()2()

...

m()

1CCCA , be a vector valued mapping, then

@ 0()

@=

@1()

@, . . . ,

@m()

@

where each element is a column vector as in fact 1.2 and@0()@

is an k m matrix.1This symbol 0 means transpose.

1


2/6

Fact 1.6. Lety = Ax where A does not depend on x, then @y@x0

= A where @y@x0

is an m n matrix.Fact 1.7. Lety = Ax where A does not depend on x, then @y

@x= A0 where @y

@xis an nm matrix.

Proposition 1.5. Suppose y = Ax (), where =

0

B@

1...

r

1

CA, then if A does not depend on,

@y()

@0=

@y

@x0@x

@0= A

@x ()

@0

the resulting matrix is m r.Note 1. A@x

0()@0

is not well defined, to see this :

@x0

@0=

0BBB@

@x1@1

@x2@1

@x3@1

@xn@1

@x1@2

@x2@2

@x3@2

@xn@2

......

......

...@x1@r

@x2@r

@x3@r

@xn@r

1CCCA

rn

, but Amn!

Fact 1.8. y = x0Bx = (x0Bx)0 = x0B0x = y0, since y is a scalar.

Proposition 1.6. If Fact 1.8 holds =) we can always find a decomposition A of B s.t A is symmetricmatrix.

Proof. Let A = 12 (B + B0) be a symmetric matrix, then:

y =1

2(x0Bx + x0B0x)) =

1

2[x0 (B + B0) x] =

1

2(2x0Ax) = y0

Proposition 1.7. y = x

0

Ax =)@y

@x = 2x

0

A = 2Ax

1.2 Geometry of Least Squares

y = x11 + ... + xkk implies that y lives in the linear space generated by hx1,...,xki Rn , however whenwe add ", we account for all aspects of y that do no live in hx1,...,xki.Definition 1.2. We call P a projection matrix if:

1. P : Rn ! Rn.2. P =P0

3. P2 = P

Fact 1.9. In our particular caseP = X(X0X)1X0

2 Maximum likelihood method

Definition 2.1. Suppose that random variables X1,...,Xn have a joint density or frequency functionf(x1, x2,...,xn|). Given observed values Xi = xi, where i = 1,...,n, the likelihood of as a functionof x1, x2,...,xn is defined as

lik () = f(x1,...,xn|) (1)

2


3/6

X

0

y-X=e

x1

x2

y

Figure 1: Geometry of Least Squares.

Definition 2.2. The Maximum likelihood estimate, is that value of which maximizes 1In the particular case where Xi are assumed to bi i.i.d. their joint density is the producto of the marginal

desnsities and the likelihood is

lik (

) =

n

Yi=1 f(Xi|

) (2)

Rather than maximizing 1 itself, it is usually easier to maximize the natural logarithm (which is equivalentsince the logarithm is a monotonic function). For an i.i.d. sample, the log likelihood is

l () =nX

i=1

ln [f(Xi|)] (3)

2.1 Hypothesis testing

Definition 2.3. Type I error. The procedure may lead to rejection of the null hypothesis when it is true.

Definition 2.4. Type II error. The procedure may fail to reject the null hypothesis when it is false or

accepting H0 when it is false.

State of nature DecisionTruth Accept H0 (p = 1 ) Reject Ha

H0 Correct Decision Type I error (Size of the test or significance level, denoted )Ha Type II error (p = ) Prob. of rejecting H0 when it is false, it is the power of the test. = 1

Table 1: Type I and II error

3


4/6

Definition 2.5. Power of a test. The power of a test is the probability that it will correctly lead torejection of a false null hypothesis:

Power = 1 = 1 P(typeII error)Some dificulty may arise because Ha : < 0 only specifies a range, this in turn complicates by hand

calculations.

Example 2.1. Suppose that underHo = and Ha : E[D] = . Then the power of a given test is : () = p (X > c|), that is, the probability that we reject H0 when Ha is true.

Remark 2.1. Notice in the above example we are testing at a specific value Ha : = . To see a case uschas Ha : < 0, see Casella-Berger 383.

2.2 Distributions used in hypothesis testing under normality

z =

pn (x )

s t [n 1] (4)

c =(n 1) s2

2 2 [n 1] (5)

See page 1062 of Econometric Analysis 7th edition for an example.

2.3 Wald Test

2.3.1 One linear restriction

Also know as distance test or significance test.

Wk =k kk

tnk (6)

The latter is the standard form of the test of a single restricion. In the case of one linear restriction formultiple parameters, denot c as a 1 k vector containing a restricion of the form c= d where is a k 1vector.

W =c dq

2c (X0X)1 c0

tnk (7)

Notice the denominator in 7 is the standard deviation of the sum of the is taking c into consideration.Finally, W is the distance from d into standard deviation units.

2.3.2 Multiple linear restrictions

For multiple linear restriction, the statistical test takes the form

W =

R q0 h

2R (X0X)1

R0i1

R q

W =

R q

0 hR (X0X)1 R0

i1

R q

2 2q (8)

Notice 8 follows the same pattern, except now it is in cuadratic terms, specifically in square standarddeviation units.

4


5/6

Definition 2.6. Let z1 2(p) and z2 2(m) be independent. Then their ratio, devided by theirrespective degrees of freedom is a random variable with distribution F(p, m)

z1/p

z2/m F(p, m)

In order to build a test statistics for multiple linear restrictions, first notice (n

k) 2 = u0u , hence

(n k) 22

=u0u

2 2nk (9)

where 2 in the denominator is used to standarize our distribution. Using the RHS of 9 and 8 anddividing each one by their degrees of freedom, we obtain

W

q/

u0u

2 (nK) =2 (q) /q

2 (n k) /n k F(q, n k)0

B@

R q

0 hR (X0X)1 R0

i1

R q

2

1

CA

1

q

2

2

nKnK

(10)

R q

0 hR (X0X)

1 R0i1

R q

q2 F(q, n k)

3 Asymptotic theory

Definition 3.1. Convergence in probability. Let X be a r.v. and {Xn} a sequence of random variables,

limn!1

P (|Xn X| > ") ! 0OR

limn!1

P (|Xn X| ") ! 1

It is denoted Xnp! X.

Definition 3.2. Almost Sure Convergence. Let X be a r.v. and {Xn} a sequence of random variables,

P

limn!1

|Xn X| > "! 0

OR

P

limn!1

|Xn X| "! 1

OE

P (sup |Xn X| ") ! 1It is denoted Xn

a.s.! X.Definition 3.3. Convergence in rth mean. Let X be a r.v. and {Xn} a sequence of random variables,

E|Xn X|r !n!1

0

5


6/6

Definition 3.4. A sequence {Xn} is at most of order n in probability, denoted Op

n

if there existsa non-stochatic O (1) (bounded) sequence {an} s.t.

Xnn

an p! 0

i.e.

limn

P

Xnn an > "

! 0

Definition 3.5. A sequence {Xn} is of order smaller than n in probability, denoted op

n

if

Xnn

p! 0i.e.

limn

P

Xnn > "

! 0

Definition 3.6. A sequence of random variables {Xn} is stochastically bounded, denoted O (1), if for any

arbitrary small ", there exists a finite M s.t.

P (|Xn > m|) "

Note 2. See Wooldridge page 35.

6

econometrics_I definitions , theorems and facts

Documents

Transcript of econometrics_I definitions , theorems and facts