Answers Fall 10 Final

Fall 2010 ECON422 Answers - Final Exam

1/8

University of Maryland Prof. Razvan Vlaicu Department of Economics Student Name: ____________________________ December 15, 2010

Answers - Final Exam Instructions: The total number of points is 100. Please write legibly and present your reasoning clearly and concisely for each question. Question 1 (10 pts.) PROBABILITY AND STATISTICS

(i) This is just a special case of what we covered in the text, with n = 4: E(Y ) = µ and Var(Y ) = σ2/4. (ii) E(W) = E(Y1)/8 + E(Y2)/8 + E(Y3)/4 + E(Y4)/2 = µ[(1/8) + (1/8) + (1/4) + (1/2)] = µ(1 + 1 + 2 + 4)/8 = µ, which shows that W is unbiased. Because the Yi are independent, Var(W) = Var(Y1)/64 + Var(Y2)/64 + Var(Y3)/16 + Var(Y4)/4

= σ2[(1/64) + (1/64) + (4/64) + (16/64)] = σ2(22/64) = σ2(11/32). (iii) Because 11/32 > 8/32 = 1/4, Var(W) > Var(Y ) for any σ2 > 0, so Y is preferred to W because each is unbiased.


2/8

Question 2 (10 pts.) IDENTIFICATION

(i) We would want to randomly assign the number of hours in the preparation course so that hours is independent of other factors that affect performance on the SAT. Then, we would collect information on SAT score for each student in the experiment, yielding a data set {( , ) : 1,..., }i isat hours i n= , where n is the number of students we can afford to have in the study. From equation (2.7), we should try to get as much variation in ihours as is feasible. (ii) Here are three factors: innate ability, family income, and general health on the day of the exam. If we think students with higher native intelligence think they do not need to prepare for the SAT, then ability and hours will be negatively correlated. Family income would probably be positively correlated with hours, because higher income families can more easily afford preparation courses. Ruling out chronic health problems, health on the day of the exam should be roughly uncorrelated with hours spent in a preparation course. (iii) If preparation courses are effective, 1β should be positive: other factors equal, an increase in hours should increase sat. (iv) The intercept, 0β , has a useful interpretation in this example: because E(u) = 0,

0β is the average SAT score for students in the population with hours = 0.


3/8

Question 3 (15 pts.) ESTIMATION

(i) 1β < 0 because more pollution can be expected to lower housing values; note that 1β is the elasticity of price with respect to nox. 2β is probably positive because rooms roughly measures the size of a house. (However, it does not allow us to distinguish homes where each room is large from homes where each room is small.) (ii) If we assume that rooms increases with quality of the home, then log(nox) and rooms are negatively correlated when poorer neighborhoods have more pollution, something that is often true. We can use Table 3.2 to determine the direction of the bias. If 2β > 0 and Corr(x1,x2) < 0, the simple regression estimator 1β has a downward bias. But because 1β < 0, this means that the simple regression, on average, overstates the

importance of pollution. [E( 1β ) is more negative than 1β .] (iii) This is what we expect from the typical sample based on our analysis in part (ii). The simple regression estimate, −1.043, is more negative (larger in magnitude) than the multiple regression estimate, −.718. As those estimates are only for one sample, we can never know which is closer to 1β . But if this is a “typical” sample, 1β is closer to −.718.


4/8

Question 4 (15 pts.) INFERENCE

Note: Please use a critical t-value of 2, and a critical F-value of 3. (i) With df = n – 2 = 86, we obtain the 5% critical value from Table G.2 with df = 90. Because each test is two-tailed, the critical value is 1.987. The t statistic for H0: 0β = 0 is about -.89, which is much less than 1.987 in absolute value. Therefore, we fail to reject 0β = 0. The t statistic for H0: 1β = 1 is (.976 – 1)/.049 ≈ -.49, which is even less significant. (Remember, we reject H0 in favor of H1 in this case only if |t| > 1.987.) (ii) We use the SSR form of the F statistic. We are testing q = 2 restrictions and the df in the unrestricted model is 86. We are given SSRr = 209,448.99 and SSRur = 165,644.51. Therefore,

(209,448.99 165,644.51) 86 11.37,165,644.51 2

F − ⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠

which is a strong rejection of H0: from Table G.3c, the 1% critical value with 2 and 90 df is 4.85.


5/8

(iii) We use the R-squared form of the F statistic. We are testing q = 3 restrictions and there are 88 – 5 = 83 df in the unrestricted model. The F statistic is [(.829 – .820)/(1 – .829)](83/3) ≈ 1.46. The 10% critical value (again using 90 denominator df in Table G.3a) is 2.15, so we fail to reject H0 at even the 10% level. In fact, the p-value is about .23. (iv) If heteroskedasticity were present, Assumption MLR.5 would be violated, and the F statistic would not have an F distribution under the null hypothesis. Therefore, comparing the F statistic against the usual critical values, or obtaining the p-value from the F distribution, would not be especially meaningful. Question 5 (15 pts.) FUNCTIONAL FORMS, UNITS OF MEASUREMENT

(i) The turnaround point is given by 1̂β /(2| 2β̂ |), or .0003/(.000000014) ≈ 21,428.57; remember, this is sales in millions of dollars. (ii) Probably. Its t statistic is about –1.89, which is significant against the one-sided alternative H0: 1β < 0 at the 5% level (cv ≈ –1.70 with df = 29). In fact, the p-value is about .036. (iii) Because sales gets divided by 1,000 to obtain salesbil, the corresponding coefficient gets multiplied by 1,000: (1,000)(.00030) = .30. The standard error gets multiplied by the same factor. As stated in the hint, salesbil2 = sales/1,000,000, and so the coefficient on the quadratic gets multiplied by one million: (1,000,000)(.0000000070) = .0070; its standard error also gets multiplied by one million. Nothing happens to the intercept (because rdintens has not been rescaled) or to the R2: rdintens = 2.613 + .30 salesbil – .0070 salesbil2 (0.429) (.14) (.0037)

n = 32, R2 = .1484.


6/8

(iv) The equation in part (iii) is easier to read because it contains fewer zeros to the right of the decimal. Of course the interpretation of the two equations is identical once the different scales are accounted for. Question 6 (15 pts.) DUMMY VARIABLES

(i) The t statistic on hsize2 is over four in absolute value, so there is very strong evidence that it belongs in the equation. We obtain this by finding the turnaround point; this is the value of hsize that maximizes ˆsat (other things fixed): 19.3/(2 ⋅2.19) ≈ 4.41. Because hsize is measured in hundreds, the optimal size of graduating class is about 441. (ii) This is given by the coefficient on female (since black = 0): nonblack females have SAT scores about 45 points lower than nonblack males. The t statistic is about –10.51, so the difference is very statistically significant. (The very large sample size certainly contributes to the statistical significance.) (iii) Because female = 0, the coefficient on black implies that a black male has an estimated SAT score almost 170 points less than a comparable nonblack male. The t


7/8

statistic is over 13 in absolute value, so we easily reject the hypothesis that there is no ceteris paribus difference. (iv) We plug in black = 1, female = 1 for black females and black = 0 and female = 1 for nonblack females. The difference is therefore –169.81 + 62.31 = −107.50. Because the estimate depends on two coefficients, we cannot construct a t statistic from the information given. The easiest approach is to define dummy variables for three of the four race/gender categories and choose nonblack females as the base group. We can then obtain Question 7 (10 pts.) HETEROSKEDASTICITY

How would you estimate this model by WLS? (One-line answer, please.) Parts (ii) and (iii) are consequences of HESK. The homoskedasticity assumption played no role in Chapter 5 in showing that OLS is consistent. But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid, even in large samples. As heteroskedasticity is a violation of the Gauss-Markov assumptions, OLS is no longer BLUE. With Var(u|inc,price,educ,female) = σ2inc2, h(x) = inc2, where h(x) is the heteroskedasti-city function defined in equation (8.21). Therefore, ( )h x = inc, and so the transformed equation is obtained by dividing the original equation by inc:

0 1 2 3 4(1/ ) ( / ) ( / ) ( / ) ( / ).beer inc price inc educ inc female inc u incinc

β β β β β= + + + + +

Notice that 1β , which is the slope on inc in the original model, is now a constant in the transformed equation. This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation. WLS: Estimate the transformed model by OLS.


8/8

Question 8 (10 pts.) TIME SERIES State whether you AGREE or DISAGREE with the following statements, and provide a brief explanation for your statement (at most three lines). (i) The threat of confounds is bigger with time series data than it is with cross-sectional data. AGREE. An explanatory variable can be its own confound, because its lags and leads may be correlated with the dependent variable. (ii) Like cross-sectional data, time series data has the property that the observations are independent of each other. DISAGREE. Time series observations are related to each other by chronology. (iii) Two time-series variables that follow the same trend are correlated. AGREE. When one goes up the other tends to go up as well. That’s correlation. (iv) Seasonality is not an issue when using annual time series observations. AGREE. Seasonal effects occur within a year, so they can be detected only in monthly or quarterly data. (v) OLS can be used to analyze certain types of time-series data. AGREE. OLS is an appropriate estimation method for time-series data that satisfies strict exogeneity and no serial correlation.

Answers Fall 10 Final

Documents

Transcript of Answers Fall 10 Final