ps4_fall2015

8
7/21/2019 ps4_fall2015 http://slidepdf.com/reader/full/ps4fall2015 1/8 Department of Economics W3412 Columbia University Fall 2015 Problem Set 4 Introduction to Econometrics Profs. Seyhan Erden and Miikka Rokkanen for all sections. Part I. True, False, Uncertain with Explanation: (a) One can still use a linear regression framework even if the relation between a regressor and the dependent variable is not linear . (b) Including an interaction term between two independent variables,   1  and   2 , allows for the measurement of the effect of a unit increase in   1  and   2 , above and beyond the sum of the individual effects of a unit increase in the two variables alone. (c) To decide whether  = 0  + 1    +  or ln( ) = 0  + 1    +  fits the data better, you should examine the regression 2 . Part II. 1. Consider the following multiple regression model: 0 1 1, 2 2, 1, 2, [ | , ] 0 i i i i i i i Y X X u  E u X      (a) Suppose 2, 1,  2 i i  X , Can you compute the OLS coefficient? Explain. (b) Assume again that 2, 1, 2 i i  X Can you write a single variable model 0 1 1, i i i Y X u    which is equivalent to the multiple regression model above? Can you compute the OLS coefficients of this single variable model? What is the intuition here? (c) Consider the alternative model: 1 1, 2 2, i i i i Y X X u    where again 2, 1,  2 i i  X . Can you compute the OLS coefficients in this model? Explain. (d) Assume again 2, 1,  2 i i  X . Can you write a single variable model: 0 1 1, i i i Y X u    equivalent to the multiple regression model in (c)? Can you compute the OLS coefficients of this single variable model? What is the intuition here? 2. Use Table 2 to answer the following questions. Table 2 presents the results of four regressions, one in each column. Estimate the indicated regressions and fill in the values (you may either handwrite or type the entries in; if you choose to type up the table, an electronic copy of Table 2 in .doc format is available on the course Web site). For example, to fill in column (2), estimate the regression with colGPA as the dependent variable and hsGPA and skipped as the independent variables, using the “robust” option, and fill in the estimated coefficients

description

Pset 4

Transcript of ps4_fall2015

Page 1: ps4_fall2015

7/21/2019 ps4_fall2015

http://slidepdf.com/reader/full/ps4fall2015 1/8

Department of Economics W3412Columbia University Fall 2015

Problem Set 4

Introduction to Econometrics

Profs. Seyhan Erden and Miikka Rokkanenfor all sections.

Part I.

True, False, Uncertain with Explanation:(a) One can still use a linear regression framework even if the relation between a regressor and

the dependent variable is not linear . 

(b) Including an interaction term between two independent variables,  1 and  2, allows for themeasurement of the effect of a unit increase in  1 and  2, above and beyond the sum of the

individual effects of a unit increase in the two variables alone.(c) To decide whether  = 0 + 1  +   or ln() = 0 + 1  +   fits the data better,

you should examine the regression 2.

Part II.

1.  Consider the following multiple regression model:

0 1 1, 2 2,

1, 2,[ | , ] 0

i i i i

i i i

Y X X u

 E u X X 

   

 

(a) Suppose 2, 1,   2i i X X  , Can you compute the OLS coefficient? Explain.

(b) Assume again that 2, 1,2i i

 X X  Can you write a single variable model 0 1 1,i i iY X u    

which is equivalent to the multiple regression model above? Can you compute the OLScoefficients of this single variable model? What is the intuition here?

(c) Consider the alternative model: 1 1, 2 2,i i i iY X X u    where again 2, 1,   2

i i X X  . Can you

compute the OLS coefficients in this model? Explain.

(d)  Assume again 2, 1,   2i i

 X X  . Can you write a single variable model: 0 1 1,i i iY X u    

equivalent to the multiple regression model in (c)? Can you compute the OLS coefficients ofthis single variable model? What is the intuition here?

2.  Use Table 2 to answer the following questions. Table 2 presents the results of fourregressions, one in each column. Estimate the indicated regressions and fill in the values(you may either handwrite or type the entries in; if you choose to type up the table, anelectronic copy of Table 2 in .doc format is available on the course Web site). For example,to fill in column (2), estimate the regression with colGPA as the dependent variable andhsGPA and skipped as the independent variables, using the “robust” option, and fill in theestimated coefficients

Page 2: ps4_fall2015

7/21/2019 ps4_fall2015

http://slidepdf.com/reader/full/ps4fall2015 2/8

(a) Fill out the table with necessary numbers, some will be on STATA output some you willneed to calculate yourself.

(b) Common sense predicts that your high school GPA (hsGPA) and the number of classes youskipped (skipped) are determinants of your college GPA (colGPA). Use regression (2) to testthe hypothesis (at the 5% significance level) that the coefficients on these two economic

variables are all zero, against the alternative that at least one coefficient is nonzero.(c) Find the F-statistic for regression (3) and explain what is it testing?(d) Find the F-statistic for regression (4) and explain what is it testing?(e) Are bgfriend (whether you have a boy/girlfriend) and campus (whether you live on campus)

 jointly significant determinants of college GPA? Use regression (2) and (4) to test yourhypothesis. (i.e. use homoskedasticity-only F stat formula, eq.7.14 in the book, instead ofdirectly testing with STATA)

Table 1

Definitions of Variables in GPA4.dta (data is from Wooldridge textbook)Variable Definition

colGPA  Cumulative College Grade Point Average of a sample of 141students at Michigan State University in 1994.

hsGPA High School GPA of students.

 skipped   Average number of classes skipped per week.

 PC   = 1 if the students owns a personal computer= 0 otherwise.

bgfriend   = 1 if the student answered “yes” to having a boy/girl friend

question= 0 otherwise.

campus = 1 if the student lives on campus.= 0 otherwise.

Page 3: ps4_fall2015

7/21/2019 ps4_fall2015

http://slidepdf.com/reader/full/ps4fall2015 3/8

 

Table 2

College GPA ResultsDependent variable: colGPA

Regressor   (1)  (2)  (3)  (4) hsGPA

( ) ( ) ( ) ( )skipped

( ) ( ) ( ) ( )PC  __

( ) ( ) ( )bgfriend  __ __

( ) ( )campus  __ __ __

( )Intercept

( ) ( ) ( ) ( )F -stat is t ics test ing the hypoth esis that the populat ion co eff ic ients on the ind icated regressors areall zero : 

hsGPA, skipped

( ) ( ) ( ) ( )hsGPA, skipped, PC  __

( ) ( ) ( )hsGPA, skipped, PC, bgfriend,  __ __

( ) ( )bgfriend, campus  __ __ __

( )Regression sum mary stat is t ics  

2 R  

R  

Regression RMSE  

n

 Notes: Heteroskedasticity-robust standard errors are given in parentheses under estimatedcoefficients, and p-values are given in parentheses under F - statistics. The F -statistics areheteroskedasticity-robust.

Page 4: ps4_fall2015

7/21/2019 ps4_fall2015

http://slidepdf.com/reader/full/ps4fall2015 4/8

 

3.  TeachingRatingsdata set contains data on course evaluations, course characteristics, and professor characteristics for 463 courses for the academic years 2000-2002 at the Universityof Texas at Austin. These data were provided by Professor Daniel Hamermesh of the

University of Texas at Austin and were used in his paper with Amy Parker, “Beauty in theClassroom: Instructors’Pulchritude and Putative Pedagogical Productivity,” Economics ofEducation Review, August 2005, Vol. 24, No. 4, pp. 369-376.Course_eval : “Course overall” teaching evaluation score, on a scale of 1 (very

unsatisfactory) to 5 (excellent)Beauty: Rating of instructor physical appearance by a panel of six students, averaged acrossthe six panelists, shifted to have mean zero.Female = 1 if the instructor is female, 0 if the instructor is maleMinority = 1 if the instructor is a non-White, 0 if the instructor is White NNenglish = 1 if the instructor is not a native English speaker, 0 if the instructor is a nativeEnglish speaker

Intro= 1 if the course is introductory (mainly large Freshman and Sophomore courses), 0 ifthe course is not introductoryOnecredit = 1 if the course is a single-credit elective (yoga, aerobics, dance, etc.), 0 otherwiseAge: Professor’s age 

(a) Regress Course_eval on Beauty and female, test the hypothesis that all populationcoefficients are jointly significant at 5% significance level.

(b) Regress Course_eval on Beauty, female, minority and age, test the hypothesis that all population coefficients are jointly significant at 5% significance level.

(c)  Now test if minority and age are jointly significant at 1% significance level using the resultsfrom part (a) and part (b)

(d) Consider the various control variables in the data set. Which do you think should be includedin the regression? Using a table like table 3, examine the effect of Beauty on Course_eval.

(hint: Stata does not list adjusted 2 under robust option. The command to see adjusted 2is

ereturn list r2_a)

Page 5: ps4_fall2015

7/21/2019 ps4_fall2015

http://slidepdf.com/reader/full/ps4fall2015 5/8

 

Table 3

Teaching RatingsDependent variable: Course_eval

Regressor(Standard Error

Below) 

(1)  (2)  (3)  (4) 

beauty

( ) ( ) ( ) ( )female

( ) ( ) ( ) ( )minority  __

( ) ( ) ( )nnenglish  __ __

( ) ( )intro  __ __ __

( )onecredit  __ __ __

( )age  __ __ __

( )

intercept

( ) ( ) ( ) ( )F -stat is t ics test ing the nu l l hypothesis: popu lation coeff ic ients on thefol low ing regresso rs are al l zero : (p-value below) 

beauty, female

( ) ( ) ( ) ( )beauty, female, minority

 __ ( ) ( ) ( )beauty, female,

minority, nnenglish __ __

( ) ( )intro, onecredit  __ __ __

( )minority, age  __ __ __

( )intro, age  __ __ __

( )

Regression summ ary stat is t ics  2

 R  

R  

Regression RMSE  

n

 Notes: Heteroskedasticity-robust standard errors are given in parentheses underestimated coefficients, and p-values are given in parentheses under F- statistics. The F-

 statistics are heteroskedasticity-robust.

Page 6: ps4_fall2015

7/21/2019 ps4_fall2015

http://slidepdf.com/reader/full/ps4fall2015 6/8

 4.  Lawsch85 data set is collected by Kelly Barnett, an MSU economics student, for use in a

term project. The data come from two sources: The Official Guide to U.S. Law Schools,1986, Law School Admission Services, and The Gourman Report: A Ranking of Graduateand Professional Programs in American and International Universities, 1995, Washington,

D.C.(a) Regress salary on north south east and west to analyze the effects of regions on salary ofLaw School graduates. What is wrong with this regression? Why can you not do this?

(b) How would you correct the problem in part (a)?(c)  Interpret the coefficient of east  under your correction strategy in part (b). .

5.  Does the separation of corporate control from corporate ownership lead to inflated executivesalaries and worse firm performance? George Stigler and Claire Friedland have addressedthese questions empirically using a sample of firms.1 A subset of their data are in the fileexeccomp.dta. The variables in the file are described in table 4

Table 4

Definitions of Variables in execcomp.dta

Variable Definition

ecomp  Average total amount of compensation in thousands of dollars fora firm’s top three executive. 

assets Firm’s assets in millions of dollars.

 profits  Firm’s annual profits in millions of dollars.

mcontrol   A dummy variable indicating management control of the firm= 1 management-controlled firms.= 0 ownership-controlled firms.

(a) Regress executives’ compensation on the firm’s assets and profits, the control dummy, and

an intercept term. What proportion of the variation in top executive’s compensation in this

sample is accounted for by these variables?(b) If the firm’s profit rise by one million dollars, by how much do you estimate the top

executive’s average compensation will change, if assets and the form of control remain

fixed?(c) What is the estimated difference between the expected average compensations of top

executives in management-controlled firms and those in ownership-controlled firms, ifassets and profits remain fixed?

(d) Regress firm profits on firm assets and the management-control dummy. How much of the

variation in the firm’s profit in this sample can be accounted for by the variation in firm’sasset and the form of control?(e) Are the empirical results in (a) and (d) consistent with the claim that management control

hurts firm performance and leads to a higher pay for executives?

1 George J. Stigler and Claire Friedman, The Literature of Economics: The case of Berle and Means, Journal of Law

and Economics 26 no. 2 (June 1983): 237-268

Page 7: ps4_fall2015

7/21/2019 ps4_fall2015

http://slidepdf.com/reader/full/ps4fall2015 7/8

 

6.  Consider the following STATA output on college distances. This dataset contains data from arandom sample of high school seniors interviewed in 1980 and re-interviewed in 1986. Inthis exercise you will use these data to investigate the relationship between the number of

completed years of education for young adults and the distance from each student's highschool to the nearest four-year college. The variable ed  corresponds to years of education anddist  is the distance to the nearest college and it is measured in tens of miles (For example dist

= 3 means that the high school of the senior is 30 miles from the nearest college).

. reg ed dist, robust

Linear regression Number of obs = 3796F( 1, 3794) = 29.83Prob > F = 0.0000

R-squared = 0.0074Root MSE = 1.8074

------------------------------------------------------------------------------

| Robust

ed | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------

dist | -.0733727 .0134334 -5.46 0.000 -.0997101 -.0470353

_cons | 13.95586 .0378112 369.09 0.000 13.88172 14.02999------------------------------------------------------------------------------

(a) A student’s high school was 18 miles from the nearest college. Estimate the number ofyears of schooling completed.

(b) Compute the 99% confidence interval for the difference in the predicted years ofeducation between a high school senior who is 93 miles to the nearest college and anotherstudent who attends a high school that shares a campus with a college. Explain what yoursolution means in one sentence.

(c) Does distance to the nearest college explain a lot of the variation in educationalattainment? Explain.

(d) Suppose distance was measured in kilometers such that 10 miles = 16 kilometers.Replicate the entire STATA output.

(e)  Interpret the coefficient of tuition below where the dependent variable, led , is the naturallogarithm of years of education. Give one good explanation for your answer. (note thattuition is given in $1000)Linear regression Number of obs = 3796

F( 3, 3792) = 151.91Prob > F = 0.0000R-squared = 0.1001Root MSE = .12236

------------------------------------------------------------------------------| Robust

led | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------tuition | .0158511 .0069175 2.29 0.022 .0022887 .0294135momcoll | .0474716 .0063938 7.42 0.000 .034936 .0600071dadcoll | .0749874 .0055234 13.58 0.000 .0641583 .0858164_cons | 2.582142 .0065834 392.22 0.000 2.569234 2.595049

------------------------------------------------------------------------------ 

Page 8: ps4_fall2015

7/21/2019 ps4_fall2015

http://slidepdf.com/reader/full/ps4fall2015 8/8

 

Following questions will not be graded, they are for you to practice and will be discussed at

the recitation:

7. SW Empirical Exercise 6.3

8. SW Exercise 7.19. SW Exercise 7.4

10. SW Empirical Exercises 7.1