Linear Regression Primer

7/27/2019 Linear Regression Primer

1/53

DERIVING LINEAR REGRESSION COEFFICIENTS

This sequence shows how the regression coefficients for a simple regression model arederived, using the least squares criterion (OLS, for ordinary least squares)

1

0

1

2

3

4

5

6

0 1 2 3

Y

X

3Y

2Y

1Y

u X Y 21

True model


2/53

0

1

2

3

4

5

6

0 1 2 3


We will start with a numerical example with just three observations: (1,3), (2,5), and (3,6).

X

Y

3Y

2Y

1Y

2

u X Y 21

True model


3/53

0

1

2

3

4

5

6

0 1 2 3

2Y

3Y

211

b b Y

212 2

b b Y

213 3

b b Y Y

b 2 b 1

X

Writing the fitted regression as Y = b 1 + b 2X , we will determine the values of b 1 and b 2 thatminimize RSS , the sum of the squares of the residuals.

3

^


1Y

u X Y 21

True model

X b b Y 21

Fitted model


4/53

0

1

2

3

4

5

6

0 1 2 3

2Y

3Y

211

b b Y

212 2

b b Y

213 3

b b Y Y

b 2 b 1

X

4


1Y

u X Y 21

True model

X b b Y 21

Fitted model

Given our choice of b 1 and b 2, the residuals are as shown.

21333

21222

21111

36

25

3

b b Y Y e

b b Y Y e

b b Y Y e


5/53

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b

b b b b b b e e e RSS

The sum of the squares of the residuals is thus as shown above.

5


21333

21222

21111

36

25

3

b b Y Y e

b b Y Y e

b b Y Y e


6/53

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b


The quadratics have been expanded.

6



7/53

Like terms have been added together.

7


212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b



8/53

For a minimum, the partial derivatives of RSS with respect to b 1 and b 2 should be zero. (Weshould also check a second-order condition.)

8


0281260 211

b b b

RSS

06228120 212

b b

b

RSS

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b



9/53

The first-order conditions give us two equations in two unknowns.

9


0281260 211

b b b

RSS

06228120 212

b b

b

RSS

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b



10/53

0281260 211

b b b

RSS

06228120 212

b b

b

RSS

50.1,67.1 21 b b

Solving them, we find that RSS is minimized when b 1 and b 2 are equal to 1.67 and 1.50,respectively.

10

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b




11/53

Here is the scatter diagram again.

11


0

1

2

3

4

5

6

0 1 2 3

2Y

3Y

211

b b Y

212 2

b b Y

213 3

b b Y Y

b 2 b 1

X

1Y

u X Y 21

True model

X b b Y 21

Fitted model


12/5312


0

1

2

3

4

5

6

0 1 2 3

2Y

3Y

17.3

1Y

67.4

2Y

17.6

3Y

Y

b 2 b 1

X

1Y

u X Y 21

True model

Fitted model X Y 50.167.1

The fitted line and the fitted values of Y are as shown.


13/5313


Before we move on to the general case, it is as well to make a small but importantmathematical point.

0281260 211

b b b

RSS

06228120 212

b b

b

RSS

50.1,67.1 21 b b

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b



14/53

14


When we establish the expression for RSS , we do so as a function of b 1 and b 2. At thisstage, b 1 and b 2 are not specific values. Our task is to determine the particular values thatminimize RSS .

0281260 211

b b b

RSS

06228120 212

b b

b

RSS

50.1,67.1 21 b b

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b



15/53

15


We should give these values special names, to differentiate them from the rest.

0281260 211

b b b

RSS

06228120 212

b b

b

RSS

50.1,67.1 21 b b

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b



16/53

16


Obvious names would be b 1OLS and b 2OLS , OLS standing for Ordinary Least Squares andmeaning that these are the values that minimize RSS . We have re-written the first-order

conditions and their solution accordingly.

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b


0281260 OLS2OLS1

1

b b b

RSS

06228120 OLS2OLS1

2

b b

b

RSS

50.1,67.1 OLS2OLS1

b b


17/53

17


Now we will proceed to the general case with n observations.

X X n X 1

Y

n Y

1Y

u X Y 21

True model


18/53

X X n X 1

Y

1211

X b b Y

1Y

18


u X Y 21

True model

X b b Y 21

Fitted model

b 2 b 1

Given our choice of b 1 and b 2, we will obtain a fitted line as shown.

n Y

n n X b b Y

21


19/53

X X n X 1

Y

1211

X b b Y

1Y

n Y

19


b 2 b 1

The residual for the first observation is defined.

1e

n n n n n X b b Y Y Y e

X b b Y Y Y e

21

1211111

.....

u X Y 21

True model

X b b Y 21

Fitted model

n n X b b Y

21


20/53

Similarly we define the residuals for the remaining observations. That for the last one ismarked.

X X n X 1

Y

1211

X b b Y

1Y

n Y

1e

n e

20


b 2 b 1 n n n n n X b b Y Y Y e

X b b Y Y Y e

21

1211111

.....

u X Y 21

True model

X b b Y 21

Fitted model

n n X b b Y

21


21/53

i i i i i i

n n n n n n

n n n

X b b Y X b Y b X b nb Y X b b Y X b Y b X b b Y

X b b Y X b Y b X b b Y

X b b Y X b b Y e e RSS

212122

221

22121

22

2

2

1

2

1211121121

22

21

21

221

21211

221

222222

...

222

)(...)(...

21


RSS , the sum of the squares of the residuals, is defined for the general case. The data for the numerical example are shown for comparison..

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b




22/53

22


The quadratics are expanded.

212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b


i i i i i i

n n n n n n

n n n

X b b Y X b Y b X b nb Y




212122

221

22121

22

2

2

1

2

1211121121

22

21

21

221

21211

221

222222

...

222

)(...)(...



23/53

i i i i i i

n n n n n n

n n n

X b b Y X b Y b X b nb Y




212122

221

22121

22

2

2

1

2

1211121121

22

21

21

221

21211

221

222222

...

222

)(...)(...

Like terms are added together.

23


212122

21

212122

21

212122

21

212122

21

221

221

221

23

22

21

12622814370

63612936

42010425

2669

)36()25()3(

b b b b b b

b b b b b b

b b b b b b

b b b b b b




24/53

24


Note that in this equation the observations on X and Y are just data that determine thecoefficients in the expression for RSS .

212122

21 12622814370 b b b b b b RSS

0281260 211

b b

b

RSS

06228120 212

b b b

RSS 50.1,67.1 21 b b }

i i i i i i X b b Y X b Y b X b nb Y RSS 212122

221

2

222



25/53

25


The choice variables in the expression are b 1 and b 2. This may seem a bit strange becausein elementary calculus courses b 1 and b 2 are usually constants and X and Y are variables.

212122

21 12622814370 b b b b b b RSS

0281260 211

b b

b

RSS

06228120 212

b b b

RSS 50.1,67.1 21 b b }


221

2

222



26/53

26


However, if you have any doubts, compare what we are doing in the general case with whatwe did in the numerical example.

212122

21 12622814370 b b b b b b RSS

0281260 211

b b

b

RSS

06228120 212

b b b

RSS 50.1,67.1 21 b b }


2

2

1

2

222



27/53

27


The first derivative with respect to b 1.

212122

21 12622814370 b b b b b b RSS

0281260 211

b b

b

RSS

06228120 212

b b b

RSS 50.1,67.1 21 b b }


2

2

1

2

222

02220 211

i i X b Y nb

b

RSS



28/53

28


With some simple manipulation we obtain a tidy expression for b 1 .

212122

21 12622814370 b b b b b b RSS

0281260 211

b b

b

RSS

06228120 212

b b b

RSS 50.1,67.1 21 b b }


2

2

1

2

222

02220 211

i i X b Y nb

b

RSS

i i X b Y nb

21X b Y b

21



29/53

The first derivative with respect to b 2.

29

i i i i i i X b b Y X b Y b X b nb Y RSS

2121

22

2

2

1

2

222

212122

21 12622814370 b b b b b b RSS

0281260 211

b b

b

RSS

06228120 212

b b b

RSS 50.1,67.1 21 b b

02220 211

i i X b Y nb

b

RSS

i i X b Y nb

21X b Y b

21

02220 12

22

i i i i X b Y X X b

b

RSS


}



30/53

Divide through by 2.

30



2121

22

2

2

1

2

222

02220 211

i i X b Y nb

b

RSS

i i X b Y nb

21X b Y b

21

02220 12

22

i i i i X b Y X X b

b

RSS

02220 12

22

i i i i X b Y X X b

b

RSS

012

2 i i i i X b Y X X b



31/53

We now substitute for b 1 using the expression obtained for it and we thus obtain anequation that contains b 2 only.

31



2121

22

2

2

1

2

222

02220 211

i i X b Y nb

b

RSS

i i X b Y nb

21X b Y b

21

02220 12

22

i i i i X b Y X X b

b

RSS

012


0)( 22

2 i i i i X X b Y Y X X b

02220 12

22

i i i i X b Y X X b

b

RSS



32/53

32


The definition of the sample mean has been used.

012


0)( 22


02220 12

22

i i i i X b Y X X b

b

RSS

0)( 22

2X n X b Y Y X X b

i i i

n

X X

i

X n X i



33/53

33


The last two terms have been disentangled.

012


0)( 22


02220 12

22

i i i i X b Y X X b

b

RSS

0)( 22

2X n X b Y Y X X b

i i i

0222

2X nb Y X n Y X X b

i i i



34/53

012


0)( 22


34

02220 12

22

i i i i X b Y X X b

b

RSS


0)( 22

2X n X b Y Y X X b

i i i

0222

2X nb Y X n Y X X b

i i i

Terms not involving b 2 have been transferred to the right side.

Y X n Y X X n X b i i i

222



35/53

To create space, the equation is shifted to the top of the slide.

35



222


222



36/53

Hence we obtain an expression for b 2.

36



222

222 X n X

Y X n Y X

b i

i i



37/53

In practice, we shall use an alternative expression. We will demonstrate that it is equivalent.

37



222

22 X X

Y Y X X b

i

i i

222 X n X

Y X n Y X

b i

i i



38/53

Y X n Y X

Y X n Y n X X n Y Y X

Y X n Y X X Y Y X

Y X Y X Y X Y X Y Y X X

i i

i i

i i i i

i i i i i i

Expanding the numerator, we obtain the terms shown.

38



222

22 X X

Y Y X X b

i

i i

222 X n X

Y X n Y X

b i

i i



39/53

Y X n Y X


Y X n Y X X Y Y X


i i

i i

i i i i

i i i i i i

In the second term the mean value of Y is a common factor. In the third, the mean value of X is a common factor. The last term is the same for all i .

39


222

22 X X

Y Y X X b

i

i i

222 X n X

Y X n Y X

b i

i i



40/53


222

22 X X

Y Y X X b

i

i i

222 X n X

Y X n Y X

b i

i i

We use the definitions of the sample means to simplify the expression.

40

Y X n Y X


Y X n Y X X Y Y X


i i

i i

i i i i

i i i i i i

n

X X

i

X n X i



41/53

Hence we have shown that the numerators of the two expressions are the same.

41


222

22 X X

Y Y X X b

i

i i

Y X n Y X


Y X n Y X X Y Y X


i i

i i

i i i i

i i i i i i

222 X n X

Y X n Y X

b i

i i



42/53

The denominator is mathematically a special case of the numerator, replacing Y by X .Hence the expressions are quivalent.

42

Y X n Y X Y Y X X i i i i

222 X n X X X i i


222

222 X n X

Y X n Y X

b i

i i

22 X X

Y Y X X b

i

i i


43/53



44/53

44

X X n X 1

Y

1211

X b b Y

1Y

n Y

n n X b b Y

21

u X Y 21

True model

X b b Y 21

Fitted model

b 2 b 1

We chose the parameters of the fitted line so as to minimize the sum of the squares of theresiduals. As a result, we derived the expressions for b 1 and b 2.

X b Y b 21

22 X X

Y Y X X b

i

i i



45/53

45

X X n X 1

Y

1211

X b b Y

1Y

n Y

b 2 b 1

Again, we should make the mathematical point discussed in the context of the numericalexample. These are the particular values of b 1 and b 2 that minimize RSS , and we should

differentiate them from the rest by giving them special names, for example b 1OLS

and b 2OLS

.

X b Y b OLS2

OLS1

2OLS2

X X

Y Y X X b

i

i i

n n X b b Y

21

u X Y 21

True model

X b b Y 21

Fitted model



46/53

46

X X n X 1

Y

1211

X b b Y

1Y

n Y

b 2 b 1

However, for the next few chapters, we shall mostly be concerned with the OLS estimators,and so the superscript 'OLS' is not really necessary. It will be dropped, to simplify the

notation.

n n X b b Y

21

X b Y b OLS2

OLS1

2OLS2

X X

Y Y X X b

i

i i

u X Y 21

True model

X b b Y 21

Fitted model



47/53

47

Typically, an intercept should be included in the regression specification. Occasionally,however, one may have reason to fit the regression without an intercept. In the case of a

simple regression model, the true and fitted models become as shown.

u X Y 2

X b Y 2

True model Fitted model



48/53

48

We will derive the expression for b 2 from first principles using the least squares criterion.The residual in observation i is e i = Y i b 2X i .

i i i i i X b Y Y Y e

2

u X Y 2

X b Y 2




49/53

49

With this, we obtain the expression for the sum of the squares of the residuals.

i i i i i X b Y Y Y e 2

2222

222 2 i i i i i i X b Y X b Y X b Y RSS

u X Y 2

X b Y 2




50/53

We differentiate with respect to b 2. The OLS estimator is the value that makes this slopeequal to zero (the first-order condition for a minimum). Note that we have differentiated

properly between the general b 2 and the specific b 2OLS

. 50


2222


i i i Y X X b b

RSS 22

dd 22

2

u X Y 2

X b Y 2


022 2OLS2 i i i Y X X b



51/53

51

Hence, we obtain the OLS estimator of b 2 for this model.


2222


i i i Y X X b b

RSS 22

dd 22

2

2OLS2

i

i i

X

Y X b

u X Y 2

X b Y 2





52/53

52


2222


i i i Y X X b b

RSS 22

dd 22

2

2OLS2

i

i i

X

Y X b

02d

d 222

2

i X

b

RSS

The second derivative is positive, confirming that we have found a minimum.

u X Y 2

X b Y 2




53/53

Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.Subject to respect for copyright and, where appropriate, attribution, they may beused as a resource for teaching an econometrics course. There is no need torefer to the author.

The content of this slideshow comes from Section 1.3 of C. Dougherty,I n tr o d u c t i o n t o E c o n o m e t r i c s , fourth edition 2011, Oxford University Press.Additional (free) resources for both students and instructors may be

downloaded from the OUP Online Resource Centrehttp://www.oup.com/uk/orc/bin/9780199567089/ .

Individuals studying econometrics on their own who feel that they might benefitfrom participation in a formal course should consider the London School of Economics summer school courseEC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning courseEC2020 Elements of Econometricswww.londoninternational.ac.uk/lse .
http://www.oup.com/uk/orc/bin/9780199567089/http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspxhttp://c/Documents%20and%20Settings/vacharop/Local%20Settings/Temporary%20Internet%20Files/www.londoninternational.ac.uk/lsehttp://c/Documents%20and%20Settings/vacharop/Local%20Settings/Temporary%20Internet%20Files/www.londoninternational.ac.uk/lsehttp://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspxhttp://www.oup.com/uk/orc/bin/9780199567089/

Linear Regression Primer

Documents

Transcript of Linear Regression Primer