Christopher Dougherty
EC220 - Introduction to econometrics (chapter 1)Slideshow: goodness of fit
Original citation:
Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 1). [Teaching Resource]
© 2012 The Author
This version available at: http://learningresources.lse.ac.uk/127/
Available in LSE Learning Resources Online: May 2012
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/
http://learningresources.lse.ac.uk/
Four useful results:
GOODNESS OF FIT
1
0e 0 iieXYY ˆ 0ˆ iieY
This sequence explains measures of goodness of fit in regression analysis. It is convenient to start by demonstrating four useful results. The first is that the mean value of the residuals must be zero.
GOODNESS OF FIT
2
0e 0 iieXYY ˆ 0ˆ iieY
The residual in any observation is given by the difference between the actual and fitted values of Y for that observation.
iiiii XbbYYYe 21ˆ
Four useful results:
GOODNESS OF FIT
3
0e 0 iieXYY ˆ 0ˆ iieY
First substitute for the fitted value.
iiiii XbbYYYe 21ˆ ii XbbY 21
ˆ
Four useful results:
GOODNESS OF FIT
4
0e 0 iieX
iiiii XbbYYYe 21ˆ
iii XbnbYe 21
YY ˆ 0ˆ iieY
Now sum over all the observations.
Four useful results:
GOODNESS OF FIT
5
0e 0 iieX
iiiii XbbYYYe 21ˆ
iii XbnbYe 21
0)( 22
21
XbXbYY
XbbYe
iii Xn
bbYn
en
11121
YY ˆ 0ˆ iieY
Dividing through by n, we obtain the sample mean of the residuals in terms of the sample means of X and Y and the regression coefficients.
Four useful results:
GOODNESS OF FIT
6
0e 0 iieX
iiiii XbbYYYe 21ˆ
iii XbnbYe 21
0)( 22
21
XbXbYY
XbbYe XbYb 21
If we substitute for b1, the expression collapses to zero.
iii Xn
bbYn
en
11121
YY ˆ 0ˆ iieY
Four useful results:
GOODNESS OF FIT
7
YY ˆ 0 iieX 0ˆ iieY0e
Next we will demonstrate that the mean of the fitted values of Y is equal to the mean of the actual values of Y.
Four useful results:
GOODNESS OF FIT
8
iii YYe ˆ
YY ˆ 0 iieX 0ˆ iieY0e
Again, we start with the definition of a residual.
Four useful results:
GOODNESS OF FIT
iii YYe ˆ
9
iii YYe ˆ
YY ˆ 0 iieX 0ˆ iieY
Sum over all the observations.
0e
Four useful results:
GOODNESS OF FIT
iii YYe ˆ
iii Yn
Yn
en
ˆ111
YYe ˆ
10
iii YYe ˆ
YY ˆ 0 iieX 0ˆ iieY
Divide through by n. The terms in the equation are the means of the residuals, actual values of Y, and fitted values of Y, respectively.
0e
Four useful results:
GOODNESS OF FIT
iii YYe ˆ
iii Yn
Yn
en
ˆ111
YYe ˆ YY ˆ
We have just shown that the mean of the residuals is zero. Hence the mean of the fitted values is equal to the mean of the actual values.
11
iii YYe ˆ
0e YY ˆ 0 iieX 0ˆ iieY
Four useful results:
GOODNESS OF FIT
12
0e YY ˆ 0 iieX 0ˆ iieY
Next we will demonstrate that the sum of the products of the values of X and the residuals is zero.
Four useful results:
0221
21
iiii
iiiii
XbXbYX
XbbYXeX
GOODNESS OF FIT
13
0e YY ˆ 0 iieX 0ˆ iieY
We start by replacing the residual with its expression in terms of Y and X.
Four useful results:
GOODNESS OF FIT
14
0e YY ˆ 0 iieX 0ˆ iieY
We expand the expression.
0221
21
iiii
iiiii
XbXbYX
XbbYXeX
Four useful results:
GOODNESS OF FIT
15
0e YY ˆ 0 iieX 0ˆ iieY
The expression is equal to zero. One way of demonstrating this would be to substitute for b1 and b2 and show that all the terms cancel out.
0221
21
iiii
iiiii
XbXbYX
XbbYXeX
Four useful results:
GOODNESS OF FIT
16
0e YY ˆ 0 iieX 0ˆ iieY
A neater way is to recall the first order condition for b2 when deriving the regression coefficients. You can see that it is exactly what we need.
02220 12
22
iiii XbYXXbbRSS
0221
21
iiii
iiiii
XbXbYX
XbbYXeX
Four useful results:
GOODNESS OF FIT
17
0e YY ˆ 0 iieX 0ˆ iieY
Finally we will demonstrate that the sum of the products of the fitted values of Y and the residuals is zero.
Four useful results:
0
ˆ
21
21
21
ii
iii
iiii
eXbenb
eXbeb
eXbbeY
GOODNESS OF FIT
18
0e YY ˆ 0 iieX 0ˆ iieY
We start by substituting for the fitted value of Y.
Four useful results:
0
ˆ
21
21
21
ii
iii
iiii
eXbenb
eXbeb
eXbbeY
GOODNESS OF FIT
19
0e YY ˆ 0 iieX 0ˆ iieY
We expand and rearrange.
enei
Four useful results:
GOODNESS OF FIT
20
0e YY ˆ 0 iieX 0ˆ iieY
The expression is equal to zero, given the first and third useful results.
0
ˆ
21
21
21
ii
iii
iiii
eXbenb
eXbeb
eXbbeY
Four useful results:
GOODNESS OF FIT
21
222 ˆˆˆ
iiiii eYYeYeYYY
We now come to the discussion of goodness of fit. One measure of the variation in Y is the sum of its squared deviations around its sample mean, often described as the Total Sum of Squares, TSS.
GOODNESS OF FIT
iiiiii eYYYYe ˆˆ
22
222 ˆˆˆ
iiiii eYYeYeYYY
We will decompose TSS using the fact that the actual value of Y in any observationsis equal to the sum of its fitted value and the residual.
GOODNESS OF FIT
iiiiii eYYYYe ˆˆ
23
222 ˆˆˆ
iiiii eYYeYeYYY
We substitute for Yi.
GOODNESS OF FIT
iiiiii eYYYYe ˆˆ
24
222 ˆˆˆ
iiiii eYYeYeYYY
YY ˆ 0e
From the useful results, the mean of the fitted values of Y is equal to the mean of the actual values. Also, the mean of the residuals is zero.
GOODNESS OF FIT
iiiiii eYYYYe ˆˆ
25
222 ˆˆˆ
iiiii eYYeYeYYY
Hence we can simplify the expression as shown.
YY ˆ 0e
GOODNESS OF FIT
iiiiii eYYYYe ˆˆ
26
222 ˆˆˆ
iiiii eYYeYeYYY
iiiii
iiiii
eYeYeYY
eYYeYYYY
2ˆ2ˆ
ˆ2ˆ
22
222
We expand the squared terms on the right side of the equation.
GOODNESS OF FIT
iiiiii eYYYYe ˆˆ
27
222 ˆˆˆ
iiiii eYYeYeYYY
iiiii
iiiii
eYeYeYY
eYYeYYYY
2ˆ2ˆ
ˆ2ˆ
22
222
We expand the third term on the right side of the equation.
GOODNESS OF FIT
iiiiii eYYYYe ˆˆ
28
222 ˆˆˆ
iiiii eYYeYeYYY
iiiii
iiiii
eYeYeYY
eYYeYYYY
2ˆ2ˆ
ˆ2ˆ
22
222
The last two terms are both zero, given the first and fourth useful results.
0ˆ iieY so ,0e
0 ie
GOODNESS OF FIT
iiiiii eYYYYe ˆˆ
29
222 ˆˆˆ
iiiii eYYeYeYYY
iiiii
iiiii
eYeYeYY
eYYeYYYY
2ˆ2ˆ
ˆ2ˆ
22
222
222 ˆiii eYYYY RSSESSTSS
Thus we have shown that TSS, the total sum of squares of Y can be decomposed into ESS, the ‘explained’ sum of squares, and RSS, the residual (‘unexplained’) sum of squares.
GOODNESS OF FIT
iiiiii eYYYYe ˆˆ
The words explained and unexplained were put in quotation marks because the explanation may in fact be false. Y might really depend on some other variable Z, and X might be acting as a proxy for Z. It would be safer to use the expression apparently explained instead of explained.
30
222 ˆˆˆ
iiiii eYYeYeYYY
iiiii
iiiii
eYeYeYY
eYYeYYYY
2ˆ2ˆ
ˆ2ˆ
22
222
222 ˆiii eYYYY RSSESSTSS
GOODNESS OF FIT
31
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
The main criterion of goodness of fit, formally described as the coefficient of determination, but usually referred to as R2, is defined to be the ratio of ESS to TSS, that is, the proportion of the variance of Y explained by the regression equation.
222 ˆiii eYYYY RSSESSTSS
GOODNESS OF FIT
32
Obviously we would like to locate the regression line so as to make the goodness of fit as high as possible, according to this criterion. Does this objective clash with our use of the least squares principle to determine b1 and b2?
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
222 ˆiii eYYYY RSSESSTSS
GOODNESS OF FIT
33
Fortunately, there is no clash. To see this, rewrite the expression for R2 in term of RSS as shown.
2
2
2
)(1
YY
e
TSSRSSTSS
Ri
i
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
222 ˆiii eYYYY RSSESSTSS
GOODNESS OF FIT
34
2
2
2
)(1
YY
e
TSSRSSTSS
Ri
i
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
The OLS regression coefficients are chosen in such a way as to minimize the sum of the squares of the residuals. Thus it automatically follows that they maximize R2.
222 ˆiii eYYYY RSSESSTSS
GOODNESS OF FIT
Another natural criterion of goodness of fit is the correlation between the actual and fitted values of Y. We will demonstrate that this is maximized by using the least squares principle to determine the regression coefficients
35
22
2
2
2
22
2
22ˆ,
ˆˆ
ˆ
ˆ
ˆ
ˆ
RYY
YY
YY
YY
YYYY
YY
YYYY
YYYYr
i
i
i
i
ii
i
ii
ii
YY
GOODNESS OF FIT
We will start with the numerator and substitute for the actual value of Y, and its mean, in the first factor.
36
22
2
2
2
22
2
22ˆ,
ˆˆ
ˆ
ˆ
ˆ
ˆ
RYY
YY
YY
YY
YYYY
YY
YYYY
YYYYr
i
i
i
i
ii
i
ii
ii
YY
2
2
ˆ
ˆˆ
ˆˆ
ˆˆˆ
YY
eYYeYY
YYeYY
YYeYeYYYYY
i
iiii
iii
iiiii
GOODNESS OF FIT
The mean value of the residuals is zero (first useful result). We rearrange a little.
37
22
2
2
2
22
2
22ˆ,
ˆˆ
ˆ
ˆ
ˆ
ˆ
RYY
YY
YY
YY
YYYY
YY
YYYY
YYYYr
i
i
i
i
ii
i
ii
ii
YY
2
2
ˆ
ˆˆ
ˆˆ
ˆˆˆ
YY
eYYeYY
YYeYY
YYeYeYYYYY
i
iiii
iii
iiiii
0e
GOODNESS OF FIT
We expand the expression The last two terms are both zero (fourth and first useful results).
38
22
2
2
2
22
2
22ˆ,
ˆˆ
ˆ
ˆ
ˆ
ˆ
RYY
YY
YY
YY
YYYY
YY
YYYY
YYYYr
i
i
i
i
ii
i
ii
ii
YY
2
2
ˆ
ˆˆ
ˆˆ
ˆˆˆ
YY
eYYeYY
YYeYY
YYeYeYYYYY
i
iiii
iii
iiiii
0 enei0ˆ iieY
GOODNESS OF FIT
Thus the numerator simplifies to the sum of the squared deviations of the fitted values.
39
22
2
2
2
22
2
22ˆ,
ˆˆ
ˆ
ˆ
ˆ
ˆ
RYY
YY
YY
YY
YYYY
YY
YYYY
YYYYr
i
i
i
i
ii
i
ii
ii
YY
2
2
ˆ
ˆˆ
ˆˆ
ˆˆˆ
YY
eYYeYY
YYeYY
YYeYeYYYYY
i
iiii
iii
iiiii
0 enei0ˆ iieY
GOODNESS OF FIT
We have the same expression in the denominator, under a square root. Cancelling, we are left with the square root in the numerator.
40
22
2
2
2
22
2
22ˆ,
ˆˆ
ˆ
ˆ
ˆ
ˆ
RYY
YY
YY
YY
YYYY
YY
YYYY
YYYYr
i
i
i
i
ii
i
ii
ii
YY
GOODNESS OF FIT
41
22
2
2
2
22
2
22ˆ,
ˆˆ
ˆ
ˆ
ˆ
ˆ
RYY
YY
YY
YY
YYYY
YY
YYYY
YYYYr
i
i
i
i
ii
i
ii
ii
YY
Thus the correlation coefficient is the square root of R2. It follows that it is maximized by the use of the least squares principle to determine the regression coefficients.
Copyright Christopher Dougherty 2011.
These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Sections 1.5 and 1.6 of C. Dougherty,
Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own and who feel that they might
benefit from participation in a formal course should consider the London School
of Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
20 Elements of Econometrics
www.londoninternational.ac.uk/lse.
11.07.25
Top Related