Lecture13-14

download Lecture13-14

of 20

Transcript of Lecture13-14

  • 8/12/2019 Lecture13-14

    1/20

    University of Hong KongIntroductory Econometrics (ECON0701), Spring 2014

    24 February 2014

    Administrative Matters You are probably wondering about the scope of the midterm examination, which will

    be held on Monday, March 17 th. It will cover chapters 2, 3, 4, parts of 6, and 7 (i.e., up to Monday, March 3 rd ; chapter 6

    is only to the extent discussed in lecture). Also, you may have with you one A4 sheet of notes, handwritten on one side. The definition of handwritten is: the note sheet must be produced with no other

    technology than a pen or pencil.

    Multiple Regression Analysis: Inference Last time we discussed testing hypotheses that involve more than one parameter. The method is to define a third parameter that is zero then the hypothesis you want to

    test is true. For example, this is how you would pick a parameter to test i= j.

    Then manipulate the equation so you can estimate directly, and test the hypothesisthat that third parameter is equal to zero.

    Multiple Regression Analysis: Inference For example, consider the model

    and we want to test the hypothesis that 1= 2. Substitute = 1- 2 for 1 ( 1= + 2):

    i j

    0 1 1 2 2 y x x u

    0 2 1 2 2 y x x u

  • 8/12/2019 Lecture13-14

    2/20

    Multiple Regression Analysis: Inference For a second example, we will look at campaign finance expenditures. The research question is if there is evidence that candidates expenditures exactly

    offset each other. The specific hypothesis is that if candidate A increases her spending by some proportion, and candidate B increases his spending the same proportion, the resultwould be the same as if neither had increased their expenditures.

    Multiple Regression Analysis: Inference The specific model we have in mind is

    where voteA is the percent of the vote received by candidate A, expendA and expendBare As and Bs campaign expenditures, and partystrA is a measure of the strength ofAs party (the percent voting for As party in the last election).

    Multiple Regression Analysis: Inference. d

    Cont ai ns data fr omvote1. dt aobs: 173

    var s: 10 25 J un 1999 14: 07si ze: 5, 190 ( 99. 3% of memory f r ee)

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -st orage di spl ay val ue

    var i abl e name t ype f ormat l abel var i abl e l abel- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -vot eA byt e %5. 2f per cent vot e f or AexpendA f l oat %8. 2f camp. expends. by A, $1000sexpendB f l oat %8. 2f camp. expends. by B, $1000spr t ystr A byte %5. 2f % vot e f or pr esi dentl expendA f l oat %9. 0g l og( expendA)l expendB f l oat %9. 0g l og( expendB)- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Sort ed by:

    0 1 2 3ln lnvoteA expendA expendB partystrA u

  • 8/12/2019 Lecture13-14

    3/20

    Multiple Regression Analysis: Inference First lets take a look at the regression result.

    . r egr ess vot eA l expendA l expendB prt yst r A

    Sour ce | SS df MS Number of obs = 173- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 3, 169) = 215. 23

    Model | 38405. 1096 3 12801. 7032 Pr ob > F = 0. 0000Resi dual | 10052. 1389 169 59. 480112 R- squar ed = 0. 7926

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- s quar ed = 0. 7889 Tot al | 48457. 2486 172 281. 728189 Root MSE = 7. 7123

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -vot eA | Coef . St d. Er r. t P>| t | [ 95% Conf . I nt er val ]

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -l expendA | 6. 083316 . 38215 15. 92 0. 000 5. 328914 6. 837719l expendB | - 6. 615417 . 3788203 - 17. 46 0. 000 - 7. 363246 - 5. 867588pr t yst r A | . 1519574 . 0620181 2. 45 0. 015 . 0295274 . 2743873

    _cons | 45. 07893 3. 926305 11. 48 0. 000 37. 32801 52. 82985- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    Multiple Regression Analysis: Inference The hypothesis that we want to test is that proportional increases in spending by

    candidate B exactly offset proportional increases in spending by candidate A i.e.,that 2=- 1.

    If this is true, then 1+ 2=0. Therefore, let the parameter equal 1+ 2, and substitute 1= - 2.

    Multiple Regression Analysis: Inference1 2

    1 2

    0 1 2 3ln lnvoteA expendA expendB partystrA u

    0 2 2 3ln lnvoteA expendA expendB partystrA u

    0 2 3ln ln lnvoteA expendA expendB expendA partystrA u

  • 8/12/2019 Lecture13-14

    4/20

    Multiple Regression Analysis: Inference We can run this regression by creating a new variable lexpendBA which is equal to

    lexpendB lexpendA:

    . r egr ess vot eA l expendA l expendBA prt yst r A

    Sour ce | SS df MS Number of obs = 173- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 3, 169) = 215. 23

    Model | 38405. 1097 3 12801. 7032 Pr ob > F = 0. 0000Resi dual | 10052. 1388 169 59. 4801115 R- squared = 0. 7926

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- s quar ed = 0. 7889 Tot al | 48457. 2486 172 281. 728189 Root MSE = 7. 7123

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -vot eA | Coef . St d. Er r. t P>| t | [ 95% Conf . I nt er val ]

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -l expendA | - . 532101 . 5330858 - 1. 00 0. 320 - 1. 584466 . 5202638

    l expendBA | - 6. 615417 . 3788203 - 17. 46 0. 000 - 7. 363246 - 5. 867588pr t yst r A | . 1519574 . 0620181 2. 45 0. 015 . 0295274 . 2743873

    _cons | 45. 07893 3. 926305 11. 48 0. 000 37. 32801 52. 82985

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    Multiple Regression Analysis: Inference This suggests the following general strategy for testing a hypothesis that is a

    function of more than one parameter. Define a parameter that is equal to zero when the hypothesis cant be rejected.

    For example, if the hypothesis is 1+ 2=3, subtract the 3 from both sides anddefine = 1+ 2-3.

    Isolate one parameter in terms of and the other parameters; for example 1= -2+3.

    Multiple Regression Analysis: Inference Substitute for the parameter you isolated (in this example, 1) in the regression

    equation. This will suggest another regression to run where is the coefficient on one of the

    variables and the remaining parameters are coefficients on the transformedvariables.

    Run that regression and see if you can reject the hypothesis that =0. If you can,you can reject the original hypothesis, since we picked to be equal to zero whenthe original hypothesis is true.

  • 8/12/2019 Lecture13-14

    5/20

    Multiple Regression Analysis: Inference This same idea can be used to test hypotheses about the predicted value from a

    regression.

    For example, suppose we use a multiple regression model to predict students averageGPA, given their SAT scores and some characteristics of their high school. How accurate is this prediction? The idea is to define a parameter that will be equal to the prediction, given certain

    values of the independent variables, and then we have all the information we need (thestandard error) to do hypothesis testing about the prediction.

    Multiple Regression Analysis: Inference For example, suppose we have the following linear regression model for GPA in

    college:

    Where colGPA is the students college GPA, SAT is the students SAT score, hsperc isthe students percentile ranking in high school (i.e., 1=top 1%), and hsize is the size ofthe students high school, in hundreds.

    Multiple Regression Analysis: Inference. d

    Cont ai ns dat a f r omgpa2. dt aobs: 4, 137

    var s: 12 25 May 2002 14: 39si ze: 157, 206 ( 98. 5% of memory f r ee)

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -st orage di spl ay val ue

    var i abl e name t ype f ormat l abel var i abl e l abel- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -sat i nt %10. 0g combi ned SAT scor et othr s i nt %10. 0g t otal hour s t hrough f al l semestcol gpa f l oat %9. 0g GPA af t er f al l semesterathl ete byte %8. 0g =1 i f athl eteverbmat h f l oat %9. 0g verbal / mat h SAT scor ehsi ze doubl e %10. 0g si ze grad. cl ass, 100shsr ank i nt %10. 0g r ank i n gr ad. cl asshsper c f l oat %9. 0g hi gh school percent i l e, f r om t opf emal e byt e %9. 0g =1 i f f emal e

    whi t e byt e %9. 0g =1 i f whi t ebl ack byt e %9. 0g =1 i f bl ackhsi zesq f l oat %9. 0g hsi ze 2- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Sort ed by:

    20 1 2 3 4colGPA SAT hsperc hsize hsize u

  • 8/12/2019 Lecture13-14

    6/20

    Multiple Regression Analysis: Inference

    . r egr ess col gpa sat hsperc hsi ze hsi zesq

    Sour ce | SS df MS Number of obs = 4137- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 4, 4132) = 398. 02

    Model | 499. 030504 4 124. 757626 Pr ob > F = 0. 0000Resi dual | 1295. 16517 4132 . 313447524 R- squared = 0. 2781

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- s quar ed = 0. 2774 Tot al | 1794. 19567 4136 . 433799728 Root MSE = . 55986

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -col gpa | Coef . St d. Er r. t P>| t | [ 95% Conf . I nt er val ]

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -sat | . 0014925 . 0000652 22. 89 0. 000 . 0013646 . 0016204

    hsper c | - . 0138558 . 000561 - 24. 70 0. 000 - . 0149557 - . 0127559hsi ze | - . 0608815 . 0165012 - 3. 69 0. 000 - . 0932328 - . 0285302

    hsi zesq | . 0054603 . 0022698 2. 41 0. 016 . 0010102 . 0099104 _cons | 1. 492652 . 0753414 19. 81 0. 000 1. 344942 1. 640362

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    Multiple Regression Analysis: Inference Now, what if we want to construct a confidence interval for the expected GPA of a

    student with 1200 on the SAT, in the 30 th percentile of his high school class, from ahigh school of 500 students?

    This time,

    Multiple Regression Analysis: Inference Now, substitute for 0 in the original regression equation:

    This suggests that you can find by regressing colGPA on (SAT-1200), (hsperc-30),(hsize-5), and (hsize 2-52).

    20 1 2 3 4

    20 1 2 3 4

    1200 30 5 5

    1200 30 5 5

    20 1 2 3 4

    21 2 3 4

    21 2 3 4

    1 2

    3 4

    1200 30 5 5

    1200 30

    5

    colGPA SAT hsperc hsize hsize u

    colGPA

    SAT hsperc hsize hsize u

    colGPA SAT hsperc

    hsize hs

    2 25ize u

  • 8/12/2019 Lecture13-14

    7/20

    Multiple Regression Analysis: Inference We can do that by first transforming the X variables:

    . generat e sat 0=sat - 1200

    . generat e hsper c0=hsper c- 30

    . gener at e hsi ze0=hsi ze- 5

    . gener at e hsi zesq0=hsi zesq- 25

    Multiple Regression Analysis: Inference. r egr ess col gpa sat0 hsper c0 hsi ze0 hsi zesq0

    Sour ce | SS df MS Number of obs = 4137- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 4, 4132) = 398. 02

    Model | 499. 030503 4 124. 757626 Pr ob > F = 0. 0000Resi dual | 1295. 16517 4132 . 313447524 R- squared = 0. 2781

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- s quar ed = 0. 2774 Tot al | 1794. 19567 4136 . 433799728 Root MSE = . 55986

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -col gpa | Coef . St d. Er r. t P>| t | [ 95% Conf . I nt er val ]

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -sat 0 | . 0014925 . 0000652 22. 89 0. 000 . 0013646 . 0016204

    hsper c0 | - . 0138558 . 000561 - 24. 70 0. 000 - . 0149557 - . 0127559hsi ze0 | - . 0608815 . 0165012 - 3. 69 0. 000 - . 0932328 - . 0285302

    hsi zesq0 | . 0054603 . 0022698 2. 41 0. 016 . 0010102 . 0099104 _cons | 2. 700075 . 0198778 135. 83 0. 000 2. 661104 2. 739047

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    The results tell us that the expected GPA of a student with these characteristics is 2.70.The 95% confidence interval for the prediction is [2.66, 2.73].

    Multiple Regression Analysis: Inference Note that the confidence interval for the average GPA of a student with these

    characteristics is not the same as the confidence interval for a prediction of the actual GPA of an individual student with these characteristics.

    This is because the unobservables, u, contribute to the variance in the individual

    students GPA.

  • 8/12/2019 Lecture13-14

    8/20

    Multiple Regression Analysis: Inference In particular, consider the variance of the GPA of student i:

    We just found a confidence interval for the fitted value of that students college GPA,not her GPA itself.

    As we collect more and more data, the sampling variance of the fitted value willdecline because we can estimate it more accurately but the variance of theunobservables doesnt change. So there is a limit to how accurately we can predict anindividuals GPA, no matter how much we know about other peoples GPAs.

    Multiple Regression Analysis: Inference To find the standard error of the forecast of an individual students GPA, we write

    The standard error of from the regression output is the estimated standard error ofthe fitted value of college GPA. The estimated square root of the variance of u isgiven in the regression output as Root MSE (root mean squared error).

    Multiple Regression Analysis: Inference Therefore, the standard error of the forecast of an individual students college GPA

    with 1200 on the SAT, in the 30 th percentile of her high school class, from a high

    school of 500 students is:

    i i i

    i i i

    colGPA colGPA uVar colGPA Var colGPA Var u

    22

    i i i

    i i i

    i i i

    Var colGPA Var colGPA Var u

    Se colGPA Var colGPA Var u

    Se colGPA Se colGPA Var u

    22

    2 2

    0.0198778 0.55986

    0.5602

    i i i

    i

    i

    Se colGPA Se colGPA Var u

    Se colGPA

    Se colGPA

  • 8/12/2019 Lecture13-14

    9/20

    Multiple Regression Analysis: Inference How can we construct a confidence interval for the forecast? Since there are many observations (4000+), the sampling distribution is very close to

    normal, so we can use 1.96 as the 95% critical value. (This distribution is also t withn-k-1 degrees of freedom)

    Multiple Regression Analysis: Inference The 95% confidence interval for the forecast is

    This interval will contain a particular students GPA 95% of the time, if she has theindicated characteristics.

    This is very different from the confidence interval for the prediction , which containsthe expected value 95% of the time.

    Multiple Regression Analysis: Inference This means that, though the factors in the regression were important, we cannot use

    them to accurately pin down an individuals GPA. There are many factors other than SAT scores and high school performance that

    determine it as well (and these are included in u).

    Multiple Regression Analysis: Inference Now we will discuss testing joint hypotheses , that is, the hypothesis that two or more

    facts are both true. One common use of joint hypothesis testing is to test a set of exclusion restrictions

    the hypothesis that a group of variables do not affect the dependent variable once theother variables have been included in the model.

    ,

    2.70 1.96 0.560,2.70 1.96 0.560

    1.60,3.80

    j j j jc Se c Se

  • 8/12/2019 Lecture13-14

    10/20

    Multiple Regression Analysis: Inference For example, if we have the linear model

    and we want to test the hypothesis that neither x 2 nor x 3 have an effect on y once x 1 has been included in the model, we are testing the joint hypothesis that

    Multiple Regression Analysis: Inference It is not appropriate to do two t tests on single hypotheses that 1 and 2 are equal to

    zero and assume that if you fail to reject both hypotheses separately, you fail to rejectthe joint hypothesis.

    Particularly if x 2 and x 3 are highly correlated, they may be jointly significant but notindividually so.

    Multiple Regression Analysis: Inference For example, when analyzing the relationship between ones extramarital affairs, ones

    age, and ones duration of marriage, we knew that older people (who were married forlonger) had more affairs.

    But we did not know if it was because they were old or if it was because they weremarried for a long time.

    In this case, we might not have evidence to say that either variable (age or duration ofmarriage) has an effect on the number of affairs.

    Multiple Regression Analysis: Inference Therefore, age and duration may not be individually significant.

    But for sure one of the variables (age or duration of marriage) does. Therefore, they are likely jointly significant. In other words, we cant reject the hypotheses that either age is not related to the

    number of affairs OR duration of marriage is not related to the number of affairs. But for sure both of these are not true. Therefore we might reject the joint hypothesis,

    but not the individual hypotheses.

    0 1 1 2 2 3 3 y x x x u

    0 2 3: 0, 0 H

  • 8/12/2019 Lecture13-14

    11/20

    Multiple Regression Analysis: Inference The idea behind testing a set of exclusion restrictions is to run two regressions one

    with the variables we want to test and one without.

    The next step is to examine the sums of squared residuals of the two regressions. If the sums of squared residuals decline by a very small amount when the extravariables are included, they may be unnecessary, and we fail to reject the hypothesisthat they are jointly equal to zero.

    If they decline by a large amount, they are important and we reject the hypothesis thatthey have no effect on the dependent variable.

    Break

    Multiple Regression Analysis: Inference Last time we discussed testing hypotheses that involve more than one parameter. The method is to define a third parameter that is zero then the hypothesis you want to

    test is true. For example, this is how you would pick a parameter to test i= j.

    Then manipulate the equation so you can estimate directly, and test the hypothesisthat that third parameter is equal to zero.

    Multiple Regression Analysis: Inference For example, consider the model

    and we want to test the hypothesis that 1= 2. Substitute = 1- 2 for 1 ( 1= + 2):

    i j

    0 1 1 2 2 y x x u

    0 2 1 2 2 y x x u

  • 8/12/2019 Lecture13-14

    12/20

    Multiple Regression Analysis: Inference Now collect terms:

    So what we can do is define a third variable z=x 1+x2, and run this regression:

    If , the coefficient on x 1, is not significantly different from zero, then 1 and 2 arenot significantly different from each other.

    Multiple Regression Analysis: Inference We then started to talk about how to test joint hypotheses. The idea behind testing a set of exclusion restrictions is to run two regressions one

    with the variables we want to test and one without. The next step is to examine the sums of squared residuals of the two regressions. If the sums of squared residuals decline by a very small amount when the extra

    variables are included, they may be unnecessary, and we fail to reject the hypothesisthat they are jointly equal to zero.

    Multiple Regression Analysis: Inference If they decline by a large amount, they are important and we reject the hypothesis that

    they have no effect on the dependent variable. For example, suppose that we are examining the salaries of professional baseball

    players. We want to find out whether players performance statistics career batting average,

    number of home runs, etc. are related to salaries.

    Multiple Regression Analysis: Inference The specific model we have in mind is:

    The hypothesis we are interested in testing is the joint hypothesis that none of the performance statistics (bavg, hrunsyr, rbisyr) have any effect on salaries:

    0 1 2 1 2 y x x x u

    0 1 2 y x z u

    0 1 2 3 4 5ln salary years gamesyr bavg hrunsyr rbisyr u

    0 3 4 5: 0, 0, 0 H

  • 8/12/2019 Lecture13-14

    13/20

    Multiple Regression Analysis: InferenceCont ai ns data fr omD: \ Econometr i cs\ St ataf i l es\ MLB1. DTA

    obs: 353var s: 47 16 Sep 1996 15: 53si ze: 46, 949 ( 95. 0% of memory f r ee)

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    st orage di spl ay val uevar i abl e name t ype f ormat l abel var i abl e l abel- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -sal ary f l oat %9. 0g 1993 season sal aryyear s byt e %9. 0g year s i n maj or l eaguesbavg f l oat %9. 0g career bat t i ng averagegamesyr f l oat %9. 0g games per year i n l eaguehrunsyr f l oat %9. 0g home r uns per yearr bi syr f l oat %9. 0g r bi s per yearl sal ary f l oat %9. 0g l og( sal ary)- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Sort ed by:

    Multiple Regression Analysis: Inference Lets first look at the regression model without the performance statistics.

    . r egr ess l sal ary years gamesyr

    Sour ce | SS df MS Number of obs = 353- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 2, 350) = 259. 32

    Model | 293. 864058 2 146. 932029 Pr ob > F = 0. 0000Resi dual | 198. 311477 350 . 566604221 R- squared = 0. 5971

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- s quar ed = 0. 5948 Tot al | 492. 175535 352 1. 39822595 Root MSE = . 75273

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -l sal ary | Coef . St d. Er r . t P>| t | [95% Conf . I nt e rval ]

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -year s | . 071318 . 012505 5. 70 0. 000 . 0467236 . 0959124

    gamesyr | . 0201745 . 0013429 15. 02 0. 000 . 0175334 . 0228156 _cons | 11. 2238 . 108312 103. 62 0. 000 11. 01078 11. 43683

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    Multiple Regression Analysis: Inference Now lets look at the model with the performance statistics:

    . r egr ess l sal ary years gamesyr bavg hrunsyr r bi syr

    Sour ce | SS df MS Number of obs = 353- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 5, 347) = 117. 06

    Model | 308. 989208 5 61. 7978416 Pr ob > F = 0. 0000Resi dual | 183. 186327 347 . 527914487 R- squared = 0. 6278

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- s quar ed = 0. 6224 Tot al | 492. 175535 352 1. 39822595 Root MSE = . 72658

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -l sal ary | Coef . St d. Er r . t P>| t | [95% Conf . I nt e rval ]

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -year s | . 0688626 . 0121145 5. 68 0. 000 . 0450355 . 0926898

    gamesyr | . 0125521 . 0026468 4. 74 0. 000 . 0073464 . 0177578bavg | . 0009786 . 0011035 0. 89 0. 376 - . 0011918 . 003149

    hr unsyr | . 0144295 . 016057 0. 90 0. 369 - . 0171518 . 0460107r bi syr | . 0107657 . 007175 1. 50 0. 134 - . 0033462 . 0248776 _cons | 11. 19242 . 2888229 38. 75 0. 000 10. 62435 11. 76048

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  • 8/12/2019 Lecture13-14

    14/20

    Multiple Regression Analysis: Inference Notice that the residual sum of squares diminished from 198.3 to 183.2. This means that the model is fitting better, since the residuals are smaller this means

    that there is less distance between the actual and the fitted values. The question is, is this improvement enough to justify including the three variables in

    the model, or could it just be random noise?

    Multiple Regression Analysis: Inference In order to answer this question, we need to construct an F statistic :

    Where SSR ur is the sum of squared residuals for the unrestricted model, SSR r is thesum of squared residuals for the restricted model, n is the number of observations, k isthe number of variables in the unrestricted model, and q is the number of restrictions(the number of variables excluded in the restricted model).

    Multiple Regression Analysis: Inference Under the null hypothesis, the F statistic will have an F distribution (the F

    distribution is constructed especially for doing hypothesis testing of this type) with(q,n-k-1) degrees of freedom. Note that the F distribution has TWO parameters q and n-k-1 not just one, like the t

    distribution. You need both to do an F test.

    Multiple Regression Analysis: Inference In this case, the F statistic is equal to

    The F statistic is always positive. When the F statistic exceeds the critical value for an F test with (q,n-k-1) degrees of

    freedom, you reject the null hypothesis.

    /

    / 1

    r ur

    ur

    SSR SSR qF

    SSR n k

    / 198.311 183.186 / 39.55

    / 1 183.186 / 353 5 1

    r ur

    ur

    SSR SSR qF

    SSR n k

  • 8/12/2019 Lecture13-14

    15/20

    Multiple Regression Analysis: Inference In this case, looking at Table G.3b in the textbook tells us that the critical value for an

    F distribution with (3,120) degrees of freedom is 2.68.

    We actually want (3,347) degrees of freedom but this is close. Since 9.55 is much greater than 2.68, we clearly reject the hypothesis that none of the

    performance statistics influence baseball players salary.

    Multiple Regression Analysis: Inference Stata can do F tests:

    . t est bavg hr unsyr r bi syr

    ( 1) bavg = 0. 0( 2) hr unsyr = 0. 0( 3) r bi syr = 0. 0

    F( 3, 347) = 9. 55Pr ob > F = 0. 0000

    Multiple Regression Analysis: Inference Why are the variables jointly significant but not individually? It turns out they are

    highly correlated:. cor r bavg hr unsyr r bi syr( obs=353)

    | bavg hr unsyr r bi syr- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - -

    bavg | 1. 0000hr unsyr | 0. 1906 1. 0000

    r bi syr | 0. 3291 0. 8907 1. 0000

    So we know that the best players get paid more, but if this is because of home runs, batting average or whatever we cant tell.

  • 8/12/2019 Lecture13-14

    16/20

    Multiple Regression Analysis: Inference It is also possible to test a set of general linear restrictions, as opposed to merely

    exclusion restrictions (hypotheses that some of the parameters are equal to zero).

    As an example we will look at the rationality of housing price assessments. If assessors of housing prices are doing their job correctly, then their assessmentshould include the value of everything you can observe on the house; for example thenumber of bedrooms and the number of square feet.

    Multiple Regression Analysis In a regression context, this means that if you are predicting the sale price of a house

    based on its assessed value, you shouldnt need variables for the number of bedroomsor the size of the property.

    If you do, it means the price assessment hasnt taken these variables into account properly.

    Multiple Regression Analysis: Inference Specifically, the model that we have in mind is

    This is the unrestricted model . The hypothesis we want to test is:

    Multiple Regression Analysis: Inference We can construct the restricted model by substituting the restrictions

    ( 1=1; 2=0; 3=0; 4=0) into the unrestricted model. The restricted model is then

    0 1 2 3 4ln ln ln ln price assess lotsize sqrft bdrms u

    0 1 2 3 4: 1, 0, 0, 0 H

    0

    0

    ln lnln ln

    price assess u

    price assess u

  • 8/12/2019 Lecture13-14

    17/20

    Multiple Regression Analysis: Inference. d

    Cont ai ns dat a f r om hpr i ce1. dt aobs: 88

    var s: 10 17 Mar 2002 12: 21si ze: 3, 168 ( 99. 5% of memory f r ee)- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    st orage di spl ay val uevar i abl e name t ype f ormat l abel var i abl e l abel- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -pri ce f l oat %9. 0g house pri ce, $1000sassess f l oat %9. 0g assessed val ue, $1000sbdr ms byt e %9. 0g number of bdr msl ot s i ze f l oat %9. 0g s i ze of l ot i n square f eetsqr f t i nt %9. 0g si ze of house i n square f eetcol oni al byte %9. 0g =1 i f home i s col oni al st yl el pr i ce f l oat %9. 0g l og( pr i ce)l assess f l oat %9. 0g l og( assessl l ot s i ze f l oat %9. 0g l og( l ot s i ze)l sqr f t f l oat %9. 0g l og( sqr f t )- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    Sort ed by:

    Multiple Regression Analysis: Inference The unrestricted model is:

    . r egress l pr i ce l as sess l l ot s i ze l sqr f t bdrms

    Sour ce | SS df MS Number of obs = 88- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 4, 83) = 70. 58

    Model | 6. 19607473 4 1. 54901868 Pr ob > F = 0. 0000Resi dual | 1. 82152879 83 . 02194613 R- squar ed = 0. 7728

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- s quar ed = 0. 7619

    Tot al | 8. 01760352 87 . 092156362 Root MSE = . 14814- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    l pr i ce | Coef . St d. Er r . t P>| t | [95% Conf . I nt e rval ]- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    l asses s | 1. 043065 . 151446 6. 89 0. 000 . 7418453 1. 344285l l ot si ze | . 0074379 . 0385615 0. 19 0. 848 - . 0692593 . 0841352

    l sqrf t | - . 1032384 . 1384305 - 0. 75 0. 458 - . 378571 . 1720942bdrms | . 0338392 . 0220983 1. 53 0. 129 - . 0101135 . 0777918

    _cons | . 263743 . 5696647 0. 46 0. 645 - . 8692972 1. 396783- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  • 8/12/2019 Lecture13-14

    18/20

    Multiple Regression Analysis: Inference To estimate the restricted model, we create a new independent variable equal to

    ln(price)-ln(assess):

    . gener ate l passess=l pr i ce- l assess

    . r egr ess l passess

    Sour ce | SS df MS Number of obs = 88- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 0, 87) = 0. 00

    Model | 0. 00 0 . Pr ob > F = .Resi dual | 1. 88014885 87 . 021610906 R- squar ed = 0. 0000

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- s quar ed = 0. 0000 Tot al | 1. 88014885 87 . 021610906 Root MSE = . 14701

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -l passess | Coef . St d. Er r. t P>| t | [ 95% Conf . I nt er val ]

    - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - _cons | - . 0848135 . 0156709 - 5. 41 0. 000 - . 1159612 - . 0536658

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    Multiple Regression Analysis: Inference In this case, q (the number of restrictions) is 4, n (the number of observations) is 88,

    and k (the number of variables in the unrestricted model) is 4. The F statistic for thehypothesis test is:

    From table G.3b, the critical value for an F distribution with (4,90) degrees of freedomis 2.47. 0.667 is clearly less than that; therefore we fail to reject the hypothesis thatassessed prices accurately take into account observable characteristics of the property( 1=1; 2=0; 3=0; 4=0).

    Multiple Regression Analysis: Inference Another form of the F statistic that is sometimes seen is the R squared form. This derivation requires the following facts, from the definition of R squared:

    / 1.8801 1.8215 / 40.667

    / 1 1.8215 / 88 4 1r ur

    ur

    SSR SSR qF

    SSR n k

    2

    2

    21

    SSE SST SSR R SST SST

    SSR R SST SST

    SSR SST R

  • 8/12/2019 Lecture13-14

    19/20

    Multiple Regression Analysis: Inference

    Multiple Regression Analysis: Inference Note: this form of the F statistic is ONLY valid when you are testing exclusion

    restrictions (hypotheses that parameters are jointly equal to zero). If you are testing other types of restrictions like the example we just did you must

    use the SSR form.

    Multiple Regression Analysis: Inference A general strategy for testing multiple linear hypotheses is:

    First, estimate the model, and find the residual sum of squares. Next, assume the hypotheses are true, and use these assumptions to find an

    equation for the restricted model . For example, assuming a coefficient is equal tozero means the regression model will no longer include that variable.

    Multiple Regression Analysis: Inference Next, estimate the restricted model, and find the residual sum of squares for that

    model. Use the residual sums of squares from the restricted model and the unrestricted

    model, along with q (the number of restrictions), n (the number of observations),and k (the number of variables in the unrestricted model) to calculate an F statisticfor the test.

    2 2

    2

    2 2

    2

    2 2

    2

    /

    / 1

    1 1 /

    1 / 1

    1 1 /

    1 / 1

    /

    1 / 1

    r ur

    ur

    r ur

    ur

    r ur

    ur

    ur r

    ur

    SSR SSR qF

    SSR n k

    SST R SST R qF

    SST R n k

    R R qF

    R n k

    R R qF

    R n k

  • 8/12/2019 Lecture13-14

    20/20

    Multiple Regression Analysis: Inference Find the critical value for the F test by looking at the appropriate table (G.3 in your

    textbook) for an F distribution with (q,n-k-1) degrees of freedom.

    You must choose the significance level G.3a is 10%, G.3b is 5%, and G.3c is 1%.On this table, q is the numerator degrees of freedom , and n-k-1 is thedenominator degrees of freedom .

    If the F statistic exceeds the critical value, you reject the hypothesis. Otherwiseyou fail to reject the hypothesis.