Charles University
description
Transcript of Charles University
![Page 1: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/1.jpg)
Charles University
FSV UK
STAKAN III
Institute of Economic Studies
Faculty of Social Sciences Institute of Economic Studies
Faculty of Social Sciences
Jan Ámos VíšekJan Ámos Víšek
Econometrics Econometrics
Tuesday, 14.00 – 15.20
Charles University
Third Lecture
http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010/http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010/
![Page 2: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/2.jpg)
Schedule of today talk
Recalling OLS and definition of linear estimator.
Discussion of restrictions on linearity in the case of estimators and of models.
Proof of the theorem given at the end of last lecture.
Definition of the best ( linear unbiased ) estimator.
Under normality of disturbances OLS is BUE.
![Page 3: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/3.jpg)
T1T0)n,OLS( XXXˆ
T1T0)n,OLS( XXXˆ
YXXXˆ T1T)n,OLS(
Ordinary Least Squares (odhad metodou nejmenších čtverců)
Definition
An estimator where LY)X,Y(~ )X(LL
is matrix, is called the linear estimator .)np(
![Page 4: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/4.jpg)
ij is Kronecker delta, i.e. if and for . 1ij ji 0ij ji
Assumptions
Let be a sequence of r.v’s,1ii }{
Assertions
Then is the best linear unbiased estimator .
,,0 ij2
jii ),0(2
)n,SLO(
Assumptions
If moreover , and)n(OXX T )n(O)XX( 11T ‘s are independent,
If further , regular matrix,
.
QXXlim Tn
1
n
Assertions
)n,SLO( is consistent.Assumptions
Assertions
then
Q
),0())ˆ(n n0)n,OLS( N(L
where .120)n,OLS( Q))ˆ(n(cov
Theorem
![Page 5: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/5.jpg)
Proof
}{ T1T0)n,OLS( XXXˆ
0T1T0 XXX
)n,OLS( is unbiased
)n,OLS( is linear
YLYXXXˆ T1T)n,OLS(
)n,OLS( is BLUE
T1T XXX LRemember that we have denoted by .
![Page 6: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/6.jpg)
Definition
The estimator is the best one in given class of estimators if for any other , the matrix
is positive definite, i.e. for any , we have
.
G~ G
}ˆ{cov}~
{cov pR
0}ˆ{cov}~
{covT
})ZZ()ZZ{(}Zcov{ T Recalling that
![Page 7: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/7.jpg)
T2)n,OLS( LL}ˆcov{
)n,OLS( is the best in the class of unbised linear estimators
LY~ ,
00p0 LXLY~
R
})LY()LY(}
~cov{ T00{
})LXLY()LXLY T00{(
}L)XY()XY(L TT00{ T2T2TT LLL}I{LL}L {
i.e. (unit matrix) ILX
![Page 8: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/8.jpg)
)n,OLS( is the best in the class of unbised linear estimators
T2)n,OLS(T2 LL}ˆcov{,LL}~
cov{
1TT1TT )XX(XX)XX(LL)LL(
0)XX()XX()XX()XX(LX 1T1T1T1T
TT )LLL)(LLL(LL
TT LL)LL)(LL(
TLL TT )LL)(LL(LL
TT*T*T LL)LL(LL)LL()LL)(LL(
![Page 9: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/9.jpg)
T1T0)n,OLS( XXXˆ
T1T0 XXXn
n
)n,OLS( is consistent
T1
T0 Xn
1XX
n
1
pT)n( RXn
1Z
i
n
1i ik)n(
k Xn
1Z
2ik
2iiiiki X}{var,0,X
Denote then
and put
![Page 10: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/10.jpg)
)n,OLS( is consistent
Let be a sequence of independent r.v’s with
finite means and positive variances , .
1ii }{
Let moreover
i2i ,2,1i
0}{varn
1i ni2n1 .
Then 0)(n
1i niin1 in probability .
Lemma – law of large numbers
0
)n1i )
ii(n
1(P
n
1i i 0}{var2n2
For any
Proof :
![Page 11: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/11.jpg)
0X}{varn
1i
2ik2
2
i
n
1i2 nn1
2ik
2iiiiki X}{var,0,X
)n,OLS( is consistent
Let be a sequence of independent r.v’s with
finite means and positive variances , .
1ii }{
Let moreover
i2i ,2,1i
0}{varn
1i ni2n1 .
Then 0)(n
1i niin1
Recalling previous slide: Lemma – law of large numbers
0)(XZ ik
n
1i iki
n
1i ik)n(
kn
1
n
1
in probability .
in probability .
![Page 12: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/12.jpg)
)n,OLS( is asymptotically normal
Let be a sequence of independent r.v’s with
finite means and positive variances ,
1ii }{
. Let moreover
ii 2i
,2,1i
n
1i i2n }{varC .
Then
0)C(maxlim 1ni
ni1n
.
Central Limit Theorem - Feller- Lindeberg
and )(CZ i
n
1i i1
nn
and )1,0()Z( n NL
if and only if for any
0)z(F)z(Climn
1iCz
2i
2nn
ni
d
0
![Page 13: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/13.jpg)
)n,OLS( is asymptotically normal
Let
1n
)n(p
)n(2
)n(11n
)n( }Z,,Z,Z{}Z{
be sequence of vectors from with d.f. .
Varadarajan theorem
pR )n(F
Further let for any be the d.f. ofpR )n(F
)n(pp
)n(22
)n(11 ZZZ .
Moreover, let be d.f. of F p21 Z,,Z,Z and be d.f. of F pp2211 ZZZ .
If for any , then . )n(F F )n(F FpR
![Page 14: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/14.jpg)
T1T0)n,OLS( XXXˆ
)n,OLS( is asymptotically normal
T
1T0)n,OLS( XXX
n
1)ˆ
n
1(n
Firstly we verify conditions of Feller-Lindeberg theorem for
TXn
1T, for arbitrary and secondly we apply Vara-
darajan theorem. Then we transform asymptotically normally
distributed vector by matrix .TXn
1 1T XX
n
1
pR
![Page 15: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/15.jpg)
)n,OLS( is the best in the class of unbiased linear estimators
REMARK
p,,2,1j,0XXY ij
n
1i
Tii
Normal equations
(See the next slides ! )
If either for some Tii 11
XY 1i
or for some 2i ji2X are large,
it may cause serious problems when solving normal equations
and solution can be rather strange.
![Page 16: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/16.jpg)
Outlier
Solution given by OLS
A “reasonable” model, neglecting the outlier
![Page 17: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/17.jpg)
Leverage point
Solution given by OLS
A “reasonable” model, neglecting the leverage point
![Page 18: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/18.jpg)
Solution given by OLS may be different from that expected by common sense.
Conclusion I
One reason is that is the best only among linear estimators. )n,SLO(
Drawing the data from previous slide on the screen of PC, the common sense propose to reject the leverage point and then apply OLS.
We obtain than “reasonable” model but it can’t be written as where is the response for all data. So this estimator is not linear.
LY Y
Restriction on the linear estimators can appear to be drastic !!
Conclusion II
![Page 19: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/19.jpg)
Restriction on the linear regression model is not substantial.
Conclusion III
And what represents the restriction on the linear model ?
Time total = -3.62 + 1.27 * Weight - 0.53 * Puls - 0.51 * Strength + 3.90 * Time per ¼-mile
Remember, we have considered model
is not a better one.
Time total = -3.62 + 1.27 * Weight + a* Weight - 0.53 * Puls + b* Puls - 0.51 * Strength + c* log(Strength) + 3.90 * Time per ¼-mile
But it is easy to test whether the model 2
3
System of all polynomials is dense in the space of continuous functions on a compact space.
Weierstrass approximation theorem
![Page 20: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/20.jpg)
What is a mutual relations
of linearity of the estimator of regression coefficients
linearity of regression model ?
and
NONE NONE
The answer is simpler then one would expect :
![Page 21: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/21.jpg)
( and to use OLS only under these conditions ).
We should find the conditions under which OLS are (is) the best estimator among all unbiased estimators.
Conclusion IV
And why OLS became so popular ?
It has a simple geometric interpretation, implying existence of solution together with an easy proof of its properties.
Nowadays however there is a lot of implementation which are safe against numerical difficulties.
There is a simple formula for evaluating it, although the evaluation need not be straightforward.
Firstly
Secondly
![Page 22: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/22.jpg)
F)( i L )z(f F
n
1i
Tii
R
)n,ML( )XY(fmaxargˆp
Let and be the density of the distribution
Recalling the definitionMaximum Likelihood Estimator (maximálně věrohodný odhad)
)n,SLO( )n,LM( )n,SLO(
)n,SLO(
Then and attains Rao-Cramer
lower bound, i.e. is the best unbiased estimator.
),0()( 2i NL )n,SLO( )n,LM(then and .
1ii }{ ),0(),,0()( 22
i NLLet be iid. r.v’s, .
)n,SLO(If is the best unbiased estimator attaining
Rao-Cramer lower bound of variance,
Theorem Assumptions
Assumptions
Assertions
Assertions
BLUE
![Page 23: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/23.jpg)
n
1i2
2Tii
R
)n,ML( }{ }2
)XY(exp{)2/1(maxargˆ
p
n
1i 2
2Tii
R
}{2
)XY()2log(maxarg
p
n
1i 2
2Tii
R
}{2
)XY(maxarg
p
n
1i
2Tii
R
)XY(minargp
)n,SLO(
Maximum Likelihood Estimator under assumption of normality of disturbances
A monotone transformation doesn’t change location of extreme!
This is a constant with respect to
The change of sign changes “max” to “min” !
![Page 24: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/24.jpg)
y),y(f),y(fˆ }{ )2()1(j
)2(j
)1(j d
If is unbiased, then
write instead of ),X,y(fn ),y(f
Denote joint density of disturbances by
y),y(f}){,y(f
),y(f),y(fˆ )2()2(
k)1(
k)2(
)2()1(
j)2(k
)1(k
)2(j
)1(j d
),X,y(fn
Recalling Rao-Cramer lower bound of variance of unbiased estimator
Let us divide both sides by )2(
k)1(
k
![Page 25: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/25.jpg)
So we have
y),y(f}){,y(f
),y(f),y(fˆ )2()2(
k)1(
k)2(
)2()1(
j)2(k
)1(k
)2(j
)1(j d
y),y(f),y(flogˆ
kjjk d
)2(
In matrix form
y),y(f),y(flogˆI
T
d
was arbitrary hence write instead of it
Multiply it by from the left-hand-side and by from the right-one.
p,,1k,1k,,1l,)2(l
)1(l
)2(k
)1(k 0. Then let .
Assume that ,
T
![Page 26: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/26.jpg)
So we have for any
y),y(f),y(flogˆ
T
TT d
pR
y),y(f),y(flog),y(flog
d
y),y(f
y),y(f),y(f
1),y(fdd
01y),y(fy),y(f
dd
Intermediate considerations
![Page 27: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/27.jpg)
So we have for any
But then
y),y(f),y(flog),y(flogˆ
T
TT d
pR
y),y(f),y(flog),y(flogˆ
T
T d
y),y(f),y(flog),y(flogˆ
T
T d
0
Further intermediate considerations
Finally write as ),y(f ),y(f),y(f
![Page 28: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/28.jpg)
So we have for any
),y(fˆˆTT
pR
y),y(f),y(flog),y(flog
T
d
y),y(fˆˆ2
TT d
2
1
y),y(f),y(flog),y(flog
2T
d
Applying Cauchy-Schwarz inequality
dx)x(hdx)x(gdx)x(h)x(g 22
2
![Page 29: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/29.jpg)
So we have for any pR
),y(flog
varˆˆvar TTTT
2T2T ˆˆ)(
2T),y(flog
TT ˆˆˆˆ
T
T ),y(flog),y(flog
ˆcovT
),y(flog
covT TT2T )(
,
Notice, both r.v. are scalars!!
i. e.
![Page 30: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/30.jpg)
Since it holds for any , we have
( in the sense of positive semidefinite matrices)
pR
T ˆcov
),y(flogcovT
1
T ),y(flogcov
Tˆcov
1
TT ),y(flogcov TT ˆcov
),y(flog
covAssuming regularity of
Select with 12T
![Page 31: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/31.jpg)
Since it holds for any , we have pR
and
1
),y(flogcovˆcov0
(inequality is in the sense of positive semidefinite matrices).
ˆˆTT
y),y(f),y(flog),y(flog
T
d
Cauchy-Schwarz inequality has been applied on
1
T ),y(flogcov ˆcovT
We would like to reach equality !
![Page 32: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/32.jpg)
),y(flog
, i.e.
)(),X,y(flog
)()X,y(ˆ n
n
1i2
2Tii
n }{ }2
)Xy(exp{)2/1(),X,y(f
Remember the joint density of disturbances is
n
1i iTii
n X)Xy(),X,y(flog
Hence the equality is reached iff is a linear function of
where is a matrix and . )( )pp( pR)(
![Page 33: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/33.jpg)
)(XX)(Xy)()X,y(ˆ Ti
n
1i ii
n
1i i
Hence
pTi
n
1i i Ra)(XX)(
)(X)Xy()()X,y(ˆi
Tii
SoaYX)()X,y(ˆ T
.
,
.
aXX)(a)X(X)(aYX)( TTT
)X,y( cannot depend on
is to be unbiased, i.e. for any )X,y( pR
and so with . 1T XX)(
0a
)n,SLO(T1T ˆYXXX)X,Y(ˆ Finally .
![Page 34: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/34.jpg)
If attains Rao-Cramer lower bound, then the equality in Cauchy Schwarz inequality is reached and hence
)n,SLO(
( write instead of ))X,y(ˆ )n,SLO(
)(
),X,y(flog)()X,y(ˆ n)n,SLO(
)(ˆ)(),X,y(flog 1n
)y(U)(ˆ)(),X,y(flog n
)y(U)(yX)XX)((),X,y(flog T1Tn
(notice that after integration )pR)(
The proof of opposite direction.
![Page 35: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/35.jpg)
)y(U)(yX)XX)((),X,y(flog T1Tn
)y(U)(yX),X,y(flog TT2n
)y(U)(expyXexp),X,y(f TT2n
)y(U~
)(~
exp)Xy()Xy(2
1expf T
2n
This we only rewrote from the previous slide
Since , for any regular matrix ,
there is a vector so that .
pR)( XX T
XX TT 2)(
It has to hold for any and any of type pR X )pn(
1y),X,y(fn d y),X,y(fyX)XX( nT1T dand
![Page 36: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/36.jpg)
Imposing the marginal conditions, we obtain finally
)Xy()Xy(
2
1exp
2
1),X,y(f T
2
n
n
![Page 37: Charles University](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813ac0550346895da2cfd3/html5/thumbnails/37.jpg)
What is to be learnt from this lecture for exam ?
Linearity of estimator and of model – what advantages and restrictions do they represent ?
What means : “The estimator is the best in the class of … .”?
OLS is the best unbiased estimator - the condition(s) for it.
All what you need is on http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010