Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction,...

66
Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016

Transcript of Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction,...

Page 1: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Lecture 1: Introduction, Regressions and CausalInference

January 10, 2016

Page 2: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances

2 Understand p-values and Hypothesis Tests3 Underlying Regression Assumptions (and why E[U|X ] = 0 is so

important)4 Understand what controls do5 Fixed-effects6 Interpreting Regression Coefficients7 The Basics of Causal Inference

This will be the focus of next course

Page 3: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances

2 Understand p-values and Hypothesis Tests3 Underlying Regression Assumptions (and why E[U|X ] = 0 is so

important)4 Understand what controls do5 Fixed-effects6 Interpreting Regression Coefficients7 The Basics of Causal Inference

This will be the focus of next course

Page 4: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances2 Understand p-values and Hypothesis Tests

3 Underlying Regression Assumptions (and why E[U|X ] = 0 is soimportant)

4 Understand what controls do5 Fixed-effects6 Interpreting Regression Coefficients7 The Basics of Causal Inference

This will be the focus of next course

Page 5: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances2 Understand p-values and Hypothesis Tests

3 Underlying Regression Assumptions (and why E[U|X ] = 0 is soimportant)

4 Understand what controls do5 Fixed-effects6 Interpreting Regression Coefficients7 The Basics of Causal Inference

This will be the focus of next course

Page 6: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances2 Understand p-values and Hypothesis Tests3 Underlying Regression Assumptions (and why E[U|X ] = 0 is so

important)

4 Understand what controls do5 Fixed-effects6 Interpreting Regression Coefficients7 The Basics of Causal Inference

This will be the focus of next course

Page 7: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances2 Understand p-values and Hypothesis Tests3 Underlying Regression Assumptions (and why E[U|X ] = 0 is so

important)

4 Understand what controls do5 Fixed-effects6 Interpreting Regression Coefficients7 The Basics of Causal Inference

This will be the focus of next course

Page 8: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances2 Understand p-values and Hypothesis Tests3 Underlying Regression Assumptions (and why E[U|X ] = 0 is so

important)4 Understand what controls do

5 Fixed-effects6 Interpreting Regression Coefficients7 The Basics of Causal Inference

This will be the focus of next course

Page 9: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances2 Understand p-values and Hypothesis Tests3 Underlying Regression Assumptions (and why E[U|X ] = 0 is so

important)4 Understand what controls do5 Fixed-effects

6 Interpreting Regression Coefficients7 The Basics of Causal Inference

This will be the focus of next course

Page 10: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances2 Understand p-values and Hypothesis Tests3 Underlying Regression Assumptions (and why E[U|X ] = 0 is so

important)4 Understand what controls do5 Fixed-effects6 Interpreting Regression Coefficients

7 The Basics of Causal InferenceThis will be the focus of next course

Page 11: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

From PPG1004H you should understand the following concepts:

1 Understand means and variances2 Understand p-values and Hypothesis Tests3 Underlying Regression Assumptions (and why E[U|X ] = 0 is so

important)4 Understand what controls do5 Fixed-effects6 Interpreting Regression Coefficients7 The Basics of Causal Inference

This will be the focus of next course

Page 12: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Quick Review

p-values

Definition: The p-value is the probability of getting results at leastas extreme as the ones you observed, given that the null hypothesis iscorrect

It can’t tell you the magnitude of an effect, the strength of theevidence or the probability that the finding was the result of chance.

Layman’s explanation: You suspect a coin is weighted toward heads(therefore set H0 : p = 0.5). You flip it 100 times and get more headsthan tails. The p-value won’t tell you whether the coin is fair, but itwill tell you the probability that you’d get at least as many heads asyou did if the coin was fair. That’s it.

Page 13: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Quick Review

P-value and Economic vs. Statistical Significance

Statistical Significance: If p-value< 0.05, then your result isstatistically significant

Economic Significance: We could not care a less about the p-value

Page 14: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Regressions

Regression: a measure of the relation between the mean value of onevariable and corresponding values of other variables

There are many types of regressions (logit, probit, IV)We focus on Ordinary Least Square (OLS) regressions

OLS: Minimizes differences between observed responses in a linearregression model

Yi = α + β1X1i + εi =⇒ “univariate” regressionYi = α + β1X1i + β2X2i + εi =⇒ “multivariate” regression

Page 15: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Regressions and STATA

A regression equation tells you what to write in STATA

Yi = α + β1X1i + β2X2i + εi

In STATA:

Page 16: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Unit of Analysis

Before writing down a regression equation, know the unit of analysisOften used subscripts: i =individual, t =time, s =school, g =grade,p =province

Yi = α + β1X1i + β2X2i + εi

Unit is:

Ypt = α + β1X1pt + β2X2pt + εpt

Unit is:

Page 17: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Unit of Analysis II

Not all variables need to be at the same unit of analysisJust the outcome (Y), the regressor of interest and the error term

“Identifying variation” comes from the unit of analysis

Ysgt = α + β1X1sgt + β2X2st + εsgt

The above regression uses variation from:

Page 18: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Underlying Regression Assumptions

The Required Underlying Regression Assumptions are:*****=IMPORTANT

1 Correct specification: Yi = α+ βXi + εi (linearity and additivity)****2 Exogeneity: E[ε|X ] = 0 *****3 No perfect multicollinearity4 Homoskedasticity: Var [ε|X ] = σ2

The other two often used Assumptions are:1 Normality: ε|X ∼ N(0, σ2)2 Observations are i.i.d.

Page 19: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Regression Assumption 1

Specification is correct: Yi = α + βXi + εi (linearity and additivity)

While this assumption is important, we generally have to assumesome functional form. If we believe this is wrong we can addinteractions or polynomials:

Interaction: Yi = α + β1X1i + β2X2i + β3(X1 ∗ X2)i + εiPolynomial: Yi = α + β1Xi + β2X 2

i + εi

Page 20: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Regression Assumption 2

Exogeneity: E[ε|X ] = 0. This is BY FAR the most importantassumption.

The assumption for the regression equation Yi = α + β1X1i + εi isviolated when an omitted variable, X2, is BOTH correlated to theoutcome and X1. I.e.:

Corr(Y ,X2) 6= 0Corr(X1,X2) 6= 0

If either condition fails, β1 is BIASED

Page 21: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Regression Assumption 2 Continued

Exogeneity: E[ε|X ] = 0.

Much of what empirical economists do is to find a way to make thisassumption hold

We do it by shutting down the link between X1 and X2 (i.e. makeCorr(X1,X2) = 0)

Page 22: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

The Other Assumptions

No perfect multicollinearityNot really a problem; just do not add collinear variables together in aregression (or let STATA solve the problem)

Homoskedasticity: Var [ε|X ] = σ2

Not a problem; can allow for heteroskedastic or clustered standarderrors easily. In STATA for heteroskedastic put “,r” after the “reg”command

Normality - Does not affect bias, only efficiency of OLSi.i.d. - Important for serial or auto-correlation of error terms. We can(somewhat) correct for this.

Page 23: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Introduction

What do you think of these studies? (that we see all the time innewspapers)

1 http://www.nbcnews.com/health/cancer/university-texas-study-links-meat-kidney-cancer-n459811

2 http://hereandnow.wbur.org/2016/01/06/sugar-breast-cancer-study

Page 24: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Correlation and Causation

Correlation:

Causation:

1 http://www.tylervigen.com/spurious-correlations

Page 25: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Causal Inference

For any causal statement you should be able to answer all thefollowing questions:

1 What is the unit of analysis?2 What is the treatment?3 What outcome are we interested in?4 What are the counterfactual outcomes?5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 26: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Causal Inference

A good way to think about these is to do a thought experiment andthink about ‘treated’ and ‘untreated’Suppose we have two types of people. People A get a drug. People Bdo not. We are interested in their blood pressure.

What is our treatment?

What is our counterfactual?

Page 27: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Causal Inference

I am going to introduce some math notation here:The outcome for each treated person is: Y1,iThe outcome for each untreated person is: Y0,i

What are the outcomes for the treated?

For the untreated?

What is the treatment effect?

What is the selection bias?

Page 28: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Thought experiments to regressions

How can we find the difference in outcomes between people A and B?

µA − µB or can regress:

Page 29: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Example

Let’s try doing the causal statements for the following article:http://www.nbcnews.com/health/cancer/university-texas-study-links-meat-kidney-cancer-n459811

1 What is the unit of analysis?

2 What is the treatment?3 What outcome are we interested in?4 What are the counterfactual outcomes?5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 30: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Example

Let’s try doing the causal statements for the following article:http://www.nbcnews.com/health/cancer/university-texas-study-links-meat-kidney-cancer-n459811

1 What is the unit of analysis?2 What is the treatment?

3 What outcome are we interested in?4 What are the counterfactual outcomes?5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 31: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Example

Let’s try doing the causal statements for the following article:http://www.nbcnews.com/health/cancer/university-texas-study-links-meat-kidney-cancer-n459811

1 What is the unit of analysis?2 What is the treatment?3 What outcome are we interested in?

4 What are the counterfactual outcomes?5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 32: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Example

Let’s try doing the causal statements for the following article:http://www.nbcnews.com/health/cancer/university-texas-study-links-meat-kidney-cancer-n459811

1 What is the unit of analysis?2 What is the treatment?3 What outcome are we interested in?4 What are the counterfactual outcomes?

5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 33: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Example

Let’s try doing the causal statements for the following article:http://www.nbcnews.com/health/cancer/university-texas-study-links-meat-kidney-cancer-n459811

1 What is the unit of analysis?2 What is the treatment?3 What outcome are we interested in?4 What are the counterfactual outcomes?5 What is the causal link?

6 How is the counterfactual mimicked? Does this sound reasonable?

Page 34: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Example

Let’s try doing the causal statements for the following article:http://www.nbcnews.com/health/cancer/university-texas-study-links-meat-kidney-cancer-n459811

1 What is the unit of analysis?2 What is the treatment?3 What outcome are we interested in?4 What are the counterfactual outcomes?5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 35: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Causal Inference

There are 5 basic empirical methods to obtain causal inference:

1 Controls (includes matching/fixed-effects)2 Randomized Experiments3 Difference-in-Differences4 Instrumental Variables5 Regression Discontinuity

Page 36: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

The main problem of causal inference is the possibility of omittedvariable bias (OVB)

So why do we not just control for all omitted variables?

Page 37: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

So while controls do not seem great at getting causality, they help by:

Eliminating obvious OVBReducing standard error

For this reason, controls are almost always included in every regression

Page 38: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

http://well.blogs.nytimes.com/2015/08/19/researchers-link-longer-work-hours-and-stroke-risk/?_r=0

1 What is the unit of analysis?

2 What is the treatment?3 What outcome are we interested in?4 What are the counterfactual outcomes?5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 39: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

http://well.blogs.nytimes.com/2015/08/19/researchers-link-longer-work-hours-and-stroke-risk/?_r=0

1 What is the unit of analysis?2 What is the treatment?

3 What outcome are we interested in?4 What are the counterfactual outcomes?5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 40: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

http://well.blogs.nytimes.com/2015/08/19/researchers-link-longer-work-hours-and-stroke-risk/?_r=0

1 What is the unit of analysis?2 What is the treatment?3 What outcome are we interested in?

4 What are the counterfactual outcomes?5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 41: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

http://well.blogs.nytimes.com/2015/08/19/researchers-link-longer-work-hours-and-stroke-risk/?_r=0

1 What is the unit of analysis?2 What is the treatment?3 What outcome are we interested in?4 What are the counterfactual outcomes?

5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 42: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

http://well.blogs.nytimes.com/2015/08/19/researchers-link-longer-work-hours-and-stroke-risk/?_r=0

1 What is the unit of analysis?2 What is the treatment?3 What outcome are we interested in?4 What are the counterfactual outcomes?5 What is the causal link?

6 How is the counterfactual mimicked? Does this sound reasonable?

Page 43: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

http://well.blogs.nytimes.com/2015/08/19/researchers-link-longer-work-hours-and-stroke-risk/?_r=0

1 What is the unit of analysis?2 What is the treatment?3 What outcome are we interested in?4 What are the counterfactual outcomes?5 What is the causal link?6 How is the counterfactual mimicked? Does this sound reasonable?

Page 44: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

strokei = α + β1HoursWorkedi + β2Controlsi + εi

What controls could you realistically never put in the above regressionthat may lead to OVB?

Page 45: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

Whether you add a control or not often depends on qualitativereasoning

For this reason, researchers often report many regressions, usingvariable levels of controls:

In general, add variables that should either:Directly affects the outcomeProxy for another unobserved variable that affects the outcome

Page 46: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Table 1: Difference-in-Differences Estimates of CSR on Private School Share

Outcome Variable: Private School Share (%)

(1) (2) (3) (4)

Treatment*Post -1.41*** -1.35*** -0.91*** -1.32***(0.17) (0.18) (0.28) (0.27)

Treatment 2.87*** 2.82*** 4.33*** 3.32***(0.25) (0.52) (0.60) (0.46)

Post -0.73*** 0.26* -0.01 0.24*(0.15) (0.15) (0.10) (0.13)

Year/Grade FE No Yes Yes Yes

Demographic Controls No No Yes Yes

District FE No No No Yes

Number of Observations 253,056 253,056 188,210 188,210

Page 47: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Control Variables

For example,strokei = α + β1HoursWorkedi + β2HoursExercisedi + β3Racei + εi

What is the control HoursExercisedi for?

What is the control Racei for?

Page 48: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Fixed-effects

Fixed-effects can be a bit confusing. Often they are a control.Sometimes they can be used for causal inference.

As a control:Fixed effects are just a bunch of dummy variables. They are added ascontrols just like any other variable.

i.e. Time FEs, province FEs, school FEs

For time FEs, if you have 10 years of data you:

Page 49: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Fixed-effects II

For causal inference:

Our essential concern is that people who work longer hours also differin other dimensions (e.g. diet)

Idea: Why not control for the fact they are the same person?Must have panel data (i.e. variation over time across the same person)

Essentially we estimate the effect of an increase/decrease in workinghours for person X on his likelihood of cancer

Page 50: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Fixed-effects in a Regression

In a regression, you write down fixed effect without a β in front. Thesubscript then denotes the fixed-effect.

strokeit = α + β1HoursWorkedit + β2Controlsit + λt + δi + εit

δi =

λt =

Page 51: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Using Fixed-effects

Implementing fixed-effects:

Open up STATA

Page 52: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Pros and Cons

What is the major bias concern in fixed-effects?

Pros vs. Cons of fixed-effects:

Page 53: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Causal Inference

There are 5 basic empirical methods to obtain causal inference:

1 Controls (includes matching/fixed-effects)2 Randomized Experiments3 Difference-in-Differences4 Instrumental Variables5 Regression Discontinuity

Page 54: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Randomized Experiments

Randomized experiments (or RCTs) are the “gold standard” of policyevaluation

Unfortunately, they are really tough to get off the ground

Also, some questions are not amenable to RCTsFor example:

Page 55: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Implementation

Implementation of randomized experiments can be difficult.

Before the experiment can be run you need to:1 Find necessary sample size (use “ssi” in STATA)2 Get funding (they are often very expensive)3 Get ethical approval (can be very difficult)

Page 56: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Implementation II

Afterwards, you need to:1 Randomize

Since they are so expensive, to prevent improper randomization due to“luck” researchers often “stratify” by some characteristic whichguarantees balance between treatment and control in thosecharacteristics

2 Ensure there is limited attrition (often the bane of randomized trials)3 Ensure there is no cross-contamination

Page 57: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Evaluating Randomized Trials

To evaluate randomized trials, researchers look at internal andexternal validity

We will do this for Project STAR

Page 58: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Internal Validity

For internal validity we look at:

1 Proper Randomization (look for covariate balance)

2 Differential Attrition

3 Cross Contamination

4 Hawthorne Effects (could also be under external validity)

Page 59: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Internal Validity

For internal validity we look at:1 Proper Randomization (look for covariate balance)

2 Differential Attrition

3 Cross Contamination

4 Hawthorne Effects (could also be under external validity)

Page 60: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Internal Validity

For internal validity we look at:1 Proper Randomization (look for covariate balance)

2 Differential Attrition

3 Cross Contamination

4 Hawthorne Effects (could also be under external validity)

Page 61: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Internal Validity

For internal validity we look at:1 Proper Randomization (look for covariate balance)

2 Differential Attrition

3 Cross Contamination

4 Hawthorne Effects (could also be under external validity)

Page 62: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

External Validity

For external validity we look at:

1 Generalizability (i.e. sample selection)

2 Scalability

3 General Equilibrium Effects

Page 63: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

External Validity

For external validity we look at:1 Generalizability (i.e. sample selection)

2 Scalability

3 General Equilibrium Effects

Page 64: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

External Validity

For external validity we look at:1 Generalizability (i.e. sample selection)

2 Scalability

3 General Equilibrium Effects

Page 65: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

External Validity

For external validity we look at:1 Generalizability (i.e. sample selection)

2 Scalability

3 General Equilibrium Effects

Page 66: Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Introduction Regressions Causal Inference Control Variables Randomized Experiments

Project STAR

What is Project STAR?

Open up STATA....