ps06sol

download ps06sol

of 6

Transcript of ps06sol

  • 8/12/2019 ps06sol

    1/6

    14.32 Problem Set 6 Solutions

    Paul Schrimpf

    May 19, 2009

    A. Wooldridge

    Solutions from Instructors Manual for 2nd Edition

    8 points each

    15.2 (i) It seems reasonable to assume that dist and u are uncorrelated because classrooms are not usuallyassigned with convenience for particular students in mind.

    (ii) The variable dist must be partially correlated with atndrte. More precisely, in the reduced form

    atndrte = 0+1priGP A+2ACT+ 3dist+v

    we must have 3 = 0. Given a sample of data we can test H0 : 3 = 0 against H1 : 3 = 0 usinga t test.

    (iii) We now need instrumental variables for atndrte and the interaction term,priGPAatndrte. (EventhoughpriGPAis exogenous, atndrteis not, and so priGPA atndrteis generally correlated withu.) Under the exogeneity assumption that E(u|priGP A, ACT , dist) = 0, any function ofpriGPA,ACT, anddistis uncorrelated withu. In particular, the interactionpriGPA distis uncorrelatedwithu. If dist is partially correlated with atndrtethenpriGPA distis partially correlated with

    priGP Aatndrte. So, we can estimate the equation

    stndfnl= 0+1atndrte+2priGP A+3ACT+ 4priGP Aatndrte+u

    by 2SLS using IVs dist, priGPA, ACT, and priGPA dist. It turns out this is not generallyoptimal. It may be better to add priGPA2 andpriGPA ACTto the instrument list. This wouldgive us overidentifying restrictions to test. See Wooldridge (2002, Chapters 5 and 9) for furtherdiscussion.

    15.3 It is easiest to use (15.10) but where we drop z . Remember, this is allowed becausen

    i=1(ziz)(xi

    x) =n

    i=1zi(xix). and similarly when we replace x with y. So the numerator in the formula for1

    is

    zi(yi y) =

    ziyi

    zi

    y= n1y1n1y (1)

    wheren1 =

    zi is the number of observations with zi = 1 and we have used the fact that

    ziyi/n1 =y1, the average of the yi with zi = 1. So far, we have shown that the numerator in1 is n1(y1 y).

    Next, write y as a weighted average of the averages over the two subgroups:

    y= (n0/n)y0+ (n1/n)y1

    wheren0 = nn1. Therefore,

    y1 y=[(nn1)/n]y1(n0/n)y0

    =(n0/n)(y1 y0)

    1

  • 8/12/2019 ps06sol

    2/6

    Therefore, the numerator of beta1 can be written as

    (n0n1/n)(y1 y0)

    By simply replacing y with x, the denominator in 1 can be expressed as (n0n1/n)(x1 x0). Whenwe take the ratio of these, the terms involving n0, n1, and n, cancel, leaving

    1 = y1 y0x1 x0

    15.6 (i) Plugging (15.26) into (15.22) and rearranging gives

    y1 =0+1(0+1z1+2z2+v2) +2z1+u1

    =(0+10) + (11+2)z1+12z2+u1+1v2

    and so0 = 0+10, 1 = 11+2, and2 = 12.

    (ii) From the equation in part (i),v1 = u1+1v2.

    (iii) By assumption,u1 has zero mean and is uncorrelated with z1 andz2, andv2 has these propertiesby definition. So v1 has zero mean and is uncorrelated with z1 and z2, which means that OLSconsistently estimates the j . [OLS would only be unbiased if we add the stronger assumptions

    E(u1|z1, z2) = E(v2|z1, z2) = 0.]

    15.7 I made some minor changes to the solution from the instructors manual

    (i) Even at a given income level, some students are more motivated and more able than others, andtheir families are more supportive (say, in terms of providing transportation) and enthusiasticabout education. Therefore, there is likely to be a self-selection problem: students that would dobetter anyway were also more likely to attend a choice school.

    (ii) Yes, sinceu1 does not contain income, random assignment of grants within income class meansthat grant designation is not correlated with unobservables such as student ability, motivation,and family support.

    (iii) The reduced form is

    choice= 0+1faminc+2grant+v2

    and we need 2 = 0. In other words, after accounting for income, the grant amount must havesome affect on choice. This seems reasonable, provided the grant amounts differ within eachincome class.

    (iv) To obtain the reduced form forscore we simply substitute the reduced form for choice into theequation of interest. This leads to an equation for score that is a linear function of the exogenousvariables. For suitably defined , we have:

    score= 0 +1faminc+2grant+v1.

    This equation allows us to directly estimate the effect of increasing the grant amount on the testscore, holding family income fixed. From a policy perspective this is itself of some interest.

    15.10 (i) Better and more serious students tend to go to college, and these same kinds of students may beattracted to private and, in particular, Catholic high schools. The resulting correlation betweenu and CathHS is another example of a self-selection problem: students self select toward Catholichigh schools, rather than being randomly assigned to them.

    (ii) A standardized score is a measure of student ability, so this can be used as a proxy variable inan OLS regression. Having this measure in an OLS regression should be an improvement overhaving no proxies for student ability.

    2

  • 8/12/2019 ps06sol

    3/6

    (iii) The first requirement is that CathRe1 must be uncorrelated with unobserved student motivationand ability (whatever is not captured by any proxies) and other factors in the error term. Thisholds if growing up Catholic (as opposed to attending a Catholic high school) does not make youa better student. It seems reasonable to assume that Catholics do not have more innate abilitythan non-Catholics. Whether being Catholic is unrelated to student motivation, or preparationfor high school, is a thornier issue.

    The second requirement is that being Catholic has an effect on attending a Catholic high school,controlling for the other exogenous factors that appear in the structural model. This can betested by estimating the reduced form equation of the form CathHS = 0 + 1CathRel+(other exogenous factors) + (reduced form error)

    (iv) Evans and Schwab (1995) find that being Catholic substantially increases the probability of at-tending a Catholic high school. Further, it seems that assuming CathRe1 is exogenous in thestructural equation is reasonable. See Evans and Schwab (1995) for an in-depth analysis.

    B. Angrist and Krueger

    12 points

    Use the data set on the course web site to replicate the OLS and IV estimates of the returns to schooling

    for the 1980 Census reported in Table V, Columns 1 through 4, of the paper by Angrist and Krueger (1991).Organize your results into a table that has the same layout as Table V. Show both your results and the Angristand Krueger results side by side.

    See do and log file. The results are identical to those in Angrist and Krueger. A good solution wouldinclude a nice looking table of the results.

    C. More Wooldridge

    8 points each

    16.1 (i) If1 = 0 then y1 = 1z1+ u1, and so the right-hand-side depends only on the exogenous variablez1 and the error term u1. This then is the reduced form fory1. If1 = 0, the reduced form fory1 isy1 = 2z2+ u2. (Note that having both 1 and2 equal zero is not interesting as it implies

    the bizarre condition u2u1 = 1z12z2.)If1= 0 and2 = 0, we can plug y1 = 2z2+u2 into the first equation and solve for y2:

    2z2+u2 =1y2+1z1+u1

    or

    1y2 = 1z12z2+u1u2.

    Dividing by1 (because 1 = 0) gives

    y2 =(1/1)z1(2/1)z2+ (u1u2)/1

    =21z1+22z2+v2

    where 21 = 1/1, 22 = 2/1, and v2 = (u1 u2)/1. Note that the reduced form for y2generally depends on z1 andz2 (as well as on u1 andu2).

    (ii) If we multiply the second structural equation by (1/2) and subtract it from the first structuralequation, we obtain

    y1(1/2)y1=1y21y2+1z1(1/2)2z2+u1(1/2)u2

    =1z1(1/2)2z2+u1(1/2)u2

    3

  • 8/12/2019 ps06sol

    4/6

    or

    [1(1/2)]y1 = 1z1(1/2)2z2+u1(1/2)u2.

    Because1 =2, 1(1/2)= 0, and so we can divide the equation by 1(1/2) to obtain thereduced form fory1 : y1 = 11z1 + 12z2 + v1, where11 = 1/[1(1/2)],12 = (1/2)2/[1(1/2)], and v1 = [u1(1/2)u2]/[1(1/2)]. A reduced form does exist fory2, as can be

    seen by subtracting the second equation from the first:

    0 = (12)y2+1z12z2+u1u2;

    because 1 =2, we can rearrange and divide by 12 to obtain the reduced form.

    (iii) In supply and demand examples,1 = 2 is very reasonable. If the first equation is the supplyfunction, we generally expect1 > 0, and if the second equation is the demand function, 2 < 0.The reduced forms can exist even in cases where the supply function is not upward sloping and thedemand function is not downward sloping, but we might question the usefulness of such models.

    16.2 Using simple economics, the first equation must be the demand function, as it depends on income,which is a common determinant of demand. The second equation contains a variable, rainfall, thataffects crop production and therefore corn supply.

    16.5 (i) Other things equal, a higher rate of condom usage should reduce the rate of sexually transmitteddiseases (STDs). So 1 < 0.

    (ii) If students having sex behave rationally, and condom usage does prevent STDs, then condomusage should increase as the rate of infection increases.

    (iii) If we plug the structural equation for infrate intoconuse= 0+1infrate+..., we see thatconusedepends on 1u1. Because 1 > 0, conuse is positively related to u1. In fact, if the structuralerror (u2) in the conuse equation is uncorrelated with u1, C ov(conuse, u1) =1V ar(u1)> 0. Ifwe ignore the other explanatory variables in the infrate equation, we can use equation (5.4) to

    obtain the direction of bias: plim(1)1 >0 because C ov(conuse, u1)> 0, where 1 denotes theOLS estimator. Since we think 1 < 0, OLS is biased towards zero. In other words, if we useOLS on the infrate equation, we are likely to underestimate the importance of condom use inreducing STDs. (Remember, the more negative is 1, the more effective is condom usage.)

    (iv) We would have to assume that condis does not appear, in addition to conuse, in the infrateequation. This seems reasonable, as it is usage that should directly affect STDs, and not justhaving a distribution program. But we must also assume condis is exogenous in the infrate:it cannot be correlated with unobserved factors (in u1) that also affect infrate. We must alsoassume thatcondishas some partial effect on conuse, something that can be tested by estimatingthe reduced form for conuse. It seems likely that this requirement for an IV see equations (15.30)and (15.31) is satisfied.

    D. Graddy and the Fulton Fish Market

    24 points total, 6 for each part

    This problem uses data from Graddys paper on the Fulton Fish Market. These are posted on the webpage.

    (1)

    .Combine the data for Asians and Whites in a single time series and construct 2SLS estimates of the elasticityof overall demand for whiting. Your model should include day-of-the-week dummies as exogenous covariates.Do this two ways:

    4

  • 8/12/2019 ps06sol

    5/6

    (i) Manual 2SLS (i.e., run a second stage on first-stage fitted values with reg)

    (ii) Using ivreg

    See do and log files.

    (2)

    Compare the standard errors you got in (i) and (ii). What explains the difference? Compare the 2SLSestimates to OLS estimates. The two sets of point estimates are exactly the same, but the standard errorsdiffer. Manual 2SLS gives to the usual OLS standard errors, which fail to take into account the fact that pis estimated. The standard errors of ivregress account for the uncertainty in p.

    (3)

    The quantity of fish sold seems to differ from day to day. What must be true for you to be able to use thesedaily shifts to identify the supply equation? Using the day-of-the-week dummies as instruments, estimate thesupply elasticity by two-stage least squares (using ivreg). Do the results appear to make sense? Is there anyway you can test whether day of the week dummies are valid instruments for the supply equation? (Hint:Wooldridge Section 15.5).

    For the day of the week to be a valid instrument for supply, there must be no direct effect of day of theweek on quantity supplied. The estimated supply elasticity is about 3. It is very imprecisely estimated witha standard error of 4. This elasticity is high, but that seems reasonable to me. In interpreting the estimateit is important to think about what type of elasticity it estimates. In the very short run, the elasticity offish supply should be zero. If after fishing, a fisherman suddenly sees that demand is higher than expected,then he can sell his fish for a higher price, but has no time to go out and catch more fish. However, if afisherman knows that demand is going to be higher tomorrow, then he can fish longer today to increasesupply tomorrow. Hence, supply elasticity should be high for predictable changes in demand. Since the dayof the week is predictable, the supply elasticity in response to demand changes with the day of the weekshould be high.

    Since we have one endogenous variable, price, and four day of the week dummies as instruments, theequation is over-identified, and we can test whether the instruments all give the same estimate. We do thisand fail to reject the null hypothesis that the instruments are valid with a p-value of 0.46.

    (4)

    Jointly estimate separate demand equations for Asians and Whites using 3SLS. Test whether the elasticitiesare in fact equal across the two groups (use the test command). Discuss your results in light of the factthat Asians appear to pay less for fish than whites.

    The estimated elasticity for whites is -1 and for Asians is -1.5. A price discriminating firm would charge ahigher price to less-elastic customers, so these elasticities are consistent with the the fact that Asians appearto pay less. However, the estimated difference in elasticities is not statistically significant.

    E. Extra Credit

    20 points total, 4 from each partMore on class size.1 (This problem uses the same class size data you used to do question B in problem

    set III; these data are posted in the Angrist Data Archive).

    1This problems illustrates the regression discontinuity method of causal inference

    5

  • 8/12/2019 ps06sol

    6/6

    (1) Recap

    Regress math and verbal scores on class size. Add controls for Percent Disadvantaged and enrollment ingrade. How and why do the class size effects change when you add controls?

    Without controls, larger class sizes are associated with higher test scores. This is the opposite of whatmost people believe. We think that smaller classes should lead to better results. With controls, the coefficienton class size is nearly zero. This suggests that larger class sizes are associated with other things that raisetest scores.

    (2)

    In Israel, class size is capped at integer multiples of 40. In other words, if there are 40 kids in your grade,youre in a class of 40 (usually), but if there are 41, the class is (usually) split. Assuming a new class isadded every time enrollment exceeds an integer multiple of 40 generates the following predicted class sizevariable for class size in a school with enrollmentes zs= es/[IN T(es/40)+1] Angrist and Lavy (1999) usezs as an instrumental variable for class size in regressions of test scores on enrollment and class size. Theycallzs Maimonides Rule, because the medieval Talmudic scholar Moses Maimonides proposed that class sizebe capped at 40..

    (i) Explain the rationale for this IV strategy

    A valid instrument must be correlated with class size and uncorrelated with unobservables that affecttests scores. Clearly, predicted class size, zs, should be correlated with actual class size. Also, sincezsis a function enrollment, and we can control for enrollment, it is reasonable to think thatzs would beuncorrelated with unobservables affecting test scores.

    (ii) Why is it a good idea to control for enrollment when usingzs as an instrumental variable?It is a good idea to control for enrollment because otherwise zs would be unlikely to exogenous. zsis clearly correlated with enrollment. From the above, enrollment appears to be correlated with testscores. If we do not control for enrollment, then it becomes part of the error term, andzs would becorrelated with the error, making IV inconsistent.

    (3)

    Replicate the reduced form estimates in Angrist and Lavy (1999) Table III, Columes 1-6 of Panel A.See do and log files.

    (4)

    Replicate the 2SLS estimates corresponding to these reduced form estimatesSee do and log files.

    (5)

    Why do you think the 2SLS estimates show benefits from reducing class size while the OLS estimates do not,even with controls? Because there are unobserved variables that are correlated with class size that affecttest scores.

    6