Omitted Variables

12
Omitted Variables = 0 + 1 + 2 + True Model: is the student’s cumulative college gpa (1-4) is the student’s high school gpa (1-4) is the student’s academic ability is a stochastic error term with where Empirical Specification: = 0 + 1 + = 2 + where (1 ) (2 )

description

Omitted Variables. True Model:. (1). is the student’s cumulative college gpa (1-4) is the student’s high school gpa (1-4) is the student’s academic ability is a stochastic error term with . where. Empirical Specification:. where. (2). Empirical Specification:. where. (2). - PowerPoint PPT Presentation

Transcript of Omitted Variables

Page 1: Omitted Variables

Omitted Variables𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝛽2𝐴𝑖+𝑣𝑖

True Model:• is the student’s cumulative college gpa (1-4)• is the student’s high school gpa (1-4)• is the student’s academic ability• is a stochastic error term with

where

Empirical Specification:𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝜀𝑖 𝜀𝑖=𝛽2 𝐴𝑖+𝑣 𝑖where

(1)

(2)

Page 2: Omitted Variables

Empirical Specification:𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝜀𝑖 𝜀𝑖=𝛽2 𝐴𝑖+𝑣 𝑖where

𝐸 (𝑔𝑖|𝑥 𝑖 )=𝛽0+ 𝛽1𝑥 𝑖+𝐸 (𝜀𝑖|𝑥 𝑖)

¿ 𝛽2𝐸 (𝐴𝑖|𝑥 𝑖 )+𝐸 (𝑣 𝑖|𝑥 𝑖 )

¿ 𝛽2 ∙𝑐𝑜𝑣( 𝐴𝑖 , 𝑥𝑖)𝑣𝑎𝑟 (𝑥 𝑖)

𝐸 (𝑔𝑖|𝑥 𝑖 )=𝛽0+(𝛽1+𝛽2 ∙ 𝑐𝑜𝑣(𝐴𝑖 , 𝑥𝑖)𝑣𝑎𝑟 (𝑥 𝑖) )𝑥𝑖

(2)

(3)Hence, we can re-write (2) as

𝑔𝑖=𝛽0+(𝛽1+𝛽2 ∙ 𝑐𝑜𝑣 (𝐴𝑖 ,𝑥 𝑖)𝑣𝑎𝑟 (𝑥𝑖) )𝑥 𝑖+𝑣 𝑖 𝐸 (𝑣𝑖|𝑥𝑖 )=0where

Page 3: Omitted Variables

Regressing on produces where

𝐸 ( �̂�1  |𝑥𝑖 )=𝛽1+𝛽2∙𝑐𝑜𝑣 (𝐴𝑖 ,𝑥 𝑖)𝑣𝑎𝑟 (𝑥 𝑖)

What we hoped to estimate

Students with greater ability are expected to have higher college gpa’s. (+) (+) Variances: always positive

Students with greater ability are expected to have high school gpa’s. (+)

is upward biased

Page 4: Omitted Variables

𝛽1

𝛽2omitted 𝐴𝑖

𝑐𝑜𝑣(𝐴𝑖 , 𝑥𝑖)𝑣𝑎𝑟 (𝑥 𝑖)

Students who work harder in high school and, as a result, earn higher grades, are likely to earn higher college grades, i.e., But if we don’t control for students’ academic abilities, high school grades will appear more important than they really are, because higher ability students are likely to earn both higher high school grades (and higher college grades (. Hence, higher high school grades explain higher college grades both directly and because it is a proxy for the omitted variable, ability.

Page 5: Omitted Variables

Suppose the SAT is a proxy variable for ability. In particular, suppose𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝛽2𝐴𝑖+𝑣𝑖

True Model:

𝐴𝑖=𝛿0+𝛿1𝑆𝑖+𝑢𝑖In this case, we are assuming that students’ high school grades don’t help explain their academic abilities. Substituting this equation into (1), we have𝑔𝑖= (𝛽0+𝛽2𝛿0 )+𝛽1𝑥𝑖+( 𝛽2𝛿1 )𝑆 𝑖+(𝑣𝑖+𝛽2𝑢𝑖 )

where 𝐸 (𝑢𝑖|𝑥𝑖 ,𝑆𝑖 )=0

Rewriting, 𝑔𝑖=𝛽0∗+𝛽1𝑥 𝑖+𝛽2∗𝑆𝑖+𝑣𝑖∗

• where where(3)

Page 6: Omitted Variables

Two Empirical Specifications𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝜀𝑖 𝜀𝑖=𝛽1 𝐴𝑖+𝑣 𝑖where

𝐸 ( �̂�1(2)|𝑥 𝑖)=𝛽1+𝛽2 ∙𝑐𝑜𝑣 (𝐴𝑖 , 𝑥𝑖 )𝑣𝑎𝑟 (𝑥 𝑖 )

>𝐸 ( �̂�1(3 )|𝑥 𝑖 ,𝑆𝑖)=𝛽1

𝑔𝑖=𝛽0∗+𝛽1𝑥 𝑖+𝛽2∗𝑆𝑖+𝑣𝑖∗(2)(3)

Dependent Variable: College Grade Point Average (4 point scale)All Students All Students Whites Only Non-Whites All StudentsHigh School Grade Point Average 0.548*** 0.374***(20.18) (13.54)

Scholastic Aptitude Test (SAT) 0.090***(16.49)White-Non-Hispanic (1=yes)Interaction term: Interaction term: Constant 1.223*** 0.688***(12.02) (6.81)Observations 2096 2096R-squared 0.163 0.259Absolute value of t-statistics in parentheses* significant at 10%; ** significant at 5%; *** significant at 1%

As expected, People often tell high school students that they need to study hard to eventually do well in college. The corresponding estimate is . What is the interpretation of ?

Page 7: Omitted Variables

Sitting outside of High School

Page 8: Omitted Variables

Dependent Variable: College Grade Point Average (4 point scale)All Students All Students Whites Only Non-Whites All StudentsHigh School Grade Point Average 0.407*** 0.353*** 0.353***(7.10) (11.16) (11.31)

Scholastic Aptitude Test (SAT) 0.072*** 0.091*** 0.091***(6.35) (14.21) (14.40)

White-Non-Hispanic (1=yes) 0.136(0.51)Interaction term: 0.053(0.79)Interaction term: -0.019(1.46)

Constant 0.874*** 0.738*** 0.738***(3.69) (6.50) (6.58)

Observations 593 1503 2096R-squared 0.165 0.266 0.267Absolute value of t-statistics in parentheses* significant at 10%; ** significant at 5%; *** significant at 1%Non-Hispanic-white students appear to get a bigger boost from studying hard in high school than non-white students. But is the difference statistically significant?

0.407−0.353=0.054

But the difference is not statistically significant!

Regressions run on subsamples of whites and non-whites

Fully Interacted Model

Page 9: Omitted Variables

Omitted Variables: Youth Smoking and Anti-Smoking SentimentTrue Model:• = cigarettes smoked per day• =price per pack• = anti-smoking sentiment in state s. • = stochastic error term with

whereEmpirical Specification:

𝑙𝑛𝑄𝑖𝑠=𝛽0+𝛽1𝑙𝑛𝑃 𝑖𝑠+𝜀𝑖𝑠 𝜀𝑖𝑠=𝛽2 𝐴𝑆𝑠+𝑣𝑖𝑠where

Page 10: Omitted Variables

𝐸 ( �̂�1  |𝑙𝑛𝑃𝑖𝑠 )=𝛽1+𝛽2 ∙𝑐𝑜𝑣 (𝐴𝑆𝑠 , 𝑙𝑛𝑃 𝑖𝑠)

𝑣𝑎𝑟 (𝑙𝑛𝑃 𝑖𝑠)

What we hoped to estimate

Stronger anti-smoking sentiment leads to less smoking, e.g., less smoking in public placesVariances: always positive

Stronger anti-smoking sentiment leads to higher cigarette taxes.

is downward biased

(− ) ¿

¿(− )

Page 11: Omitted Variables

Dependent Variable: ln(Quantity of cigarettes per day)All Smokers All Smokers Female

SmokersMale

Smokers All Smokers

Ln(Price per pack) -0.350 -0.310(0.150) (0.162)

Anti-smoking sentiment in state -0.160(0.240)

Female (1=yes)

Interaction term:

Constant 2.780 2.741(0.209) (0.217)

Observations 329 329R-squared 0.016 0.018Standard errors in parentheses

I chose to present standard errors because it is most natural to test whether our estimate implies that young smokers’ demand for cigarettes is inelastic, which requires we calculate a different t-stat than the one produced by standard software programs.

Page 12: Omitted Variables

Dependent Variable: ln(Quantity of cigarettes per day)All Smokers All Smokers Female

SmokersMale

Smokers All Smokers

Ln(Price per pack) -0.267 -0.443 -0.443(0.210) (0.212) (0.206)

Anti-smoking sentiment in state

Female (1=yes) -0.357(0.417)

Interaction term: 0.176(0.299)

Constant 2.605 2.962 2.962(0.291) (0.298) (0.289)

Observations 156 173 329R-squared 0.010 0.025 0.026Standard errors in parentheses

It appears that the young women’s demand for cigarettes is more inelastic than that of young men

−0.267− (−0.443 )=0.176

𝑡 𝑠𝑡𝑎𝑡=0.1760.299=0.59

But the difference is not statistically significant!