binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned...

89
binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter

Transcript of binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned...

Page 1: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatter:Binned Scatterplots in Stata

Michael Stepner

MIT

August 1, 2014

Michael Stepner binscatter

Page 2: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Motivation

Binned scatterplots are an informative and versatile way ofvisualizing relationships between variables

They are useful for:

I Exploring your data

I Communicating your results

Intimately related to regression

Any coefficient of interest from an OLS regression can bevisualized with a binned scatterplot

Can graphically depict modern identification strategies

RD, RK, event studies

Michael Stepner binscatter

Page 3: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Familiar Ground

Michael Stepner binscatter

Page 4: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Scatter Plots

Scatterplots:

Are the most basic way of visually representing therelationship between two variables

Show every data point

Become crowded when you have lots of observationsI Very informative in small samples

I Not so useful with big datasets

Page 5: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

N=2231

Source: National Longitudinal Survey of Women 1988 (nlsw88)

Page 6: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

OLS Regression

Linear regression:

Gives a number (coefficient) that describes the observedassociation

I “On average, 1 extra year of job tenure is associated with an$m higher wage”

Gives us a framework for inference about the relationship(statistical significance, confidence intervals, etc.)

Page 7: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

. reg wage tenure

Source | SS df MS Number of obs = 2231

-------------+------------------------------ F( 1, 2229) = 72.66

Model | 2339.38077 1 2339.38077 Prob > F = 0.0000

Residual | 71762.4469 2229 32.1949066 R-squared = 0.0316

-------------+------------------------------ Adj R-squared = 0.0311

Total | 74101.8276 2230 33.2295191 Root MSE = 5.6741

------------------------------------------------------------------------------

wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

tenure | .1858747 .0218054 8.52 0.000 .1431138 .2286357

_cons | 6.681316 .1772615 37.69 0.000 6.333702 7.028931

------------------------------------------------------------------------------

Page 8: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 9: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatter: step-by-step introduction

Michael Stepner binscatter

Page 10: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatter: step-by-step introduction

Let’s walk through what happens when you type:

. binscatter wage tenure

Michael Stepner binscatter

Page 11: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 12: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 13: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 14: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 15: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 16: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 17: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 18: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

. binscatter wage tenure

67

89

10

wa

ge

0 5 10 15 20tenure

Page 19: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatter: Summary

To create a binned scatterplot, binscatter

1 Groups the x-axis variable into equal-sized bins

2 Computes the mean of the x-axis and y-axis variables withineach bin

3 Creates a scatterplot of these data points

4 Draws the population regression line

binscatter supports weights

I weighted bins

I weighted means

I weighted regression line

Michael Stepner binscatter

Page 20: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Binscatter and Regression: intimately linked

Michael Stepner binscatter

Page 21: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Conditional Expectation Function

Consider two random variables: Yi and Xi

The conditional expectation function (CEF) is

E[Yi |Xi = x ] ≡ h(x)

The CEF tells us the mean value of Yi when we see Xi = x

The CEF is the best predictor of Yi given Xi

I in the sense that it minimizes Mean Squared Error

Michael Stepner binscatter

Page 22: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Regression CEF Theorem

Suppose we run an OLS regression:

Yi = α + βXi + ε

We obtain the estimated coefficients α, β

Regression fit line: h(x) = α + βx

Regression CEF Theorem:

The regression fit line h(x) = α + βx is the best linearapproximation to the CEF, h(x) = E[Yi |Xi = x ]

I in the sense that it minimizes Mean Squared Error

Michael Stepner binscatter

Page 23: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatter: CEF and regression fit line

A typical binned scatterplot shows two related objects:

a non-parametric estimate of the CEFI the binned scatter points

the best linear estimate of the CEFI the regression fit line

Michael Stepner binscatter

Page 24: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

6 7

8 9

10

wag

e

0 5 10 15 20 tenure

h(x) = α + βx

E[ y | Q1 < x ≤ Q2 ]

E[ y | Q8 < x ≤ Q9 ]

Page 25: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Interpreting binscatters

Michael Stepner binscatter

Page 26: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about standard errors

If the binned scatterpoints are tight to the regression line, theslope is precisely estimated

I regression standard error is small

If the binned scatterpoints are dispersed around the regressionline, the slope is imprecisely estimated

I regression standard error is large

I Dispersion of binned scatterpoints around the regression lineindicates statistical significance

Michael Stepner binscatter

Page 27: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

05

10

15

wa

ge

0 5 10 15 20tenure

ε ~ N(0, 0.2)

68

10

12

wa

ge

0 5 10 15 20tenure

Coef = 0.199 (0.002)

05

10

15

wa

ge

0 5 10 15 20tenure

ε ~ N(0, 2.0)

68

10

12

wa

ge

0 5 10 15 20tenure

Coef = 0.208 (0.025)

Page 28: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: not informative about R2

R2 tells you what fraction of the individual variation in Y isexplained by the regressors

A binned scatterplot collapses all the individual variation,showing only the mean within each bin

Michael Stepner binscatter

Page 29: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 30: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 31: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 32: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

01

02

03

04

0h

ou

rly w

ag

e

0 5 10 15 20 25job tenure (years)

Page 33: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: not informative about R2

The same binscatter can be generated with:

enormous variance in Y |X = x

or almost no individual variance

I because binscatter only shows E[Y |X = x ]

Michael Stepner binscatter

Page 34: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

05

10

15

wa

ge

0 5 10 15 20tenure

N=200 R2 = 0.967

78

910

11

wa

ge

0 5 10 15 20tenure

Coef = 0.200 (0.003)

05

10

15

wa

ge

0 5 10 15 20tenure

N=2000 R2 = 0.467

78

910

11

wa

ge

0 5 10 15 20tenure

Coef = 0.200 (0.005)

Page 35: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

Many different forms of underlying data can give the sameregression results

I Some examples from Anscombe (1973)...

Michael Stepner binscatter

Page 36: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

05

1015

0 5 10 15 20

Anscombe (1973): Dataset 1Earnings

($1000)

Years of Schooling

β=0.5 (0.12)

Public Economics Lectures () Part 1: Introduction 9 / 49

Page 37: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

05

1015

0 5 10 15 20

Earnings

($1000)

Years of Schooling

β=0.5 (0.12)

Anscombe (1973): Dataset 2

Public Economics Lectures () Part 1: Introduction 10 / 49

Page 38: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

05

1015

0 5 10 15 20

Earnings

($1000)

Years of Schooling

β=0.5 (0.12)

Anscombe (1973): Dataset 3

Public Economics Lectures () Part 1: Introduction 11 / 49

Page 39: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

05

1015

0 5 10 15 20

Earnings

($1000)

Years of Schooling

β=0.5 (0.12)

Anscombe (1973): Dataset 4

Public Economics Lectures () Part 1: Introduction 12 / 49

Page 40: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

Suppose the true data generating process is logarithmic

wagei = 10 + log(tenurei ) + εi

. reg wage tenure

Source | SS df MS Number of obs = 500

-------------+------------------------------ F( 1, 498) = 281.25

Model | 317.940139 1 317.940139 Prob > F = 0.0000

Residual | 562.975924 498 1.13047374 R-squared = 0.3609

-------------+------------------------------ Adj R-squared = 0.3596

Total | 880.916063 499 1.76536285 Root MSE = 1.0632

------------------------------------------------------------------------------

wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

tenure | .1841569 .0109811 16.77 0.000 .1625819 .2057318

_cons | 10.28268 .0961947 106.89 0.000 10.09369 10.47168

------------------------------------------------------------------------------

Michael Stepner binscatter

Page 41: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

Now forget that I ever told you that...

You’re just handed the data.

. reg wage tenure

Source | SS df MS Number of obs = 500

-------------+------------------------------ F( 1, 498) = 281.25

Model | 317.940139 1 317.940139 Prob > F = 0.0000

Residual | 562.975924 498 1.13047374 R-squared = 0.3609

-------------+------------------------------ Adj R-squared = 0.3596

Total | 880.916063 499 1.76536285 Root MSE = 1.0632

------------------------------------------------------------------------------

wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

tenure | .1841569 .0109811 16.77 0.000 .1625819 .2057318

_cons | 10.28268 .0961947 106.89 0.000 10.09369 10.47168

------------------------------------------------------------------------------

Michael Stepner binscatter

Page 42: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

Run a linear regression:

wagei = α + βtenurei + εi

. reg wage tenure

Source | SS df MS Number of obs = 500

-------------+------------------------------ F( 1, 498) = 281.25

Model | 317.940139 1 317.940139 Prob > F = 0.0000

Residual | 562.975924 498 1.13047374 R-squared = 0.3609

-------------+------------------------------ Adj R-squared = 0.3596

Total | 880.916063 499 1.76536285 Root MSE = 1.0632

------------------------------------------------------------------------------

wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

tenure | .1841569 .0109811 16.77 0.000 .1625819 .2057318

_cons | 10.28268 .0961947 106.89 0.000 10.09369 10.47168

------------------------------------------------------------------------------

Michael Stepner binscatter

Page 43: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

. binscatter wage tenure

91

01

11

21

3w

ag

e

0 5 10 15tenure

Michael Stepner binscatter

Page 44: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

. binscatter wage tenure

91

01

11

21

3w

ag

e

0 5 10 15tenure

Michael Stepner binscatter

Page 45: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

If the underlying CEF is smooth, binscatter provides aconsistent estimate of the CEF

I As N gets large, holding the number of quantiles constant,each binned scatter point approaches the true conditionalexpectation

Michael Stepner binscatter

Page 46: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

. binscatter wage tenure in 1/500

91

01

11

21

3w

ag

e

0 5 10 15tenure

N=500

Michael Stepner binscatter

Page 47: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

. binscatter wage tenure in 1/5000

91

01

11

21

3w

ag

e

0 5 10 15tenure

N=5,000

Michael Stepner binscatter

Page 48: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatters: informative about functional form

. binscatter wage tenure in 1/5000000

91

01

11

21

3w

ag

e

0 5 10 15tenure

N=5,000,000

Michael Stepner binscatter

Page 49: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Interpreting binscatters: moral of the story

1 Binned scatterplots are informative about standard errors

2 Binned scatterplots are not informative about R2

3 And binned scatterplots are informative about functional form

Michael Stepner binscatter

Page 50: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

How many bins?

Michael Stepner binscatter

Page 51: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

How many bins?

What is the “best” number of bins to use?

Default in binscatter is 20

in my personal experience, this default works very well

Optimal number of bins to accurately represent the CEFdepends on curvature of the underlying CEF

which is unknown (that’s why we’re approximating it!)

I a smooth function can be well approximated with few points

I a function with complex local behaviour requires many pointsto approximate its shape

Michael Stepner binscatter

Page 52: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Let’s play a quick game of...

What function is it?

Page 53: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Round 1:

05

10

15

20

y

0 5 10 15 20x

5 bins

Michael Stepner binscatter

Page 54: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Round 1: Linear

05

10

15

20

y

0 5 10 15 20x

20 bins

Michael Stepner binscatter

Page 55: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Round 2:

05

10

15

20

y

0 5 10 15 20x

5 bins

Michael Stepner binscatter

Page 56: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Round 2: Cubic

05

10

15

20

y

0 5 10 15 20x

20 bins

Michael Stepner binscatter

Page 57: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Round 3:

05

10

15

20

y

0 5 10 15 20x

5 bins

Michael Stepner binscatter

Page 58: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Round 3:

05

10

15

20

y

0 5 10 15 20x

20 bins

Michael Stepner binscatter

Page 59: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Round 3: Sinusoidal

05

10

15

20

y

0 5 10 15 20x

100 bins

Michael Stepner binscatter

Page 60: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatter: Multivariate Regression

Michael Stepner binscatter

Page 61: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatter: Multivariate Regression

The use of binned scatterplots is not restricted to studyingsimple relationships with one x-variable

binscatter can use partitioned regression to illustrate therelationship between two variables while controlling for otherregressors

Michael Stepner binscatter

Page 62: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Partitioned regression: FWL theorem

Suppose we’re interested in the relationship between y and xin the following multivariate regression:

y = α + βx + ΓZ + ε

Option 1: Run the full regression with all regressors, obtain β

Option 2: Partitioned regression

1 Regress y on Z ⇒ residuals ≡ y

2 Regress x on Z ⇒ residuals ≡ x

3 Regress y on x ⇒ coefficient = β

I The β obtained using full regression and partitioned regressionare identical

Michael Stepner binscatter

Page 63: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

binscatter: Applying partitioned regression

We’re interested in the relationship between wage and tenure,but want to control for total work experience:

wage = α + β tenure + γ experience + ε

Could directly apply partitioned regression:

. reg wage experience

. predict wage_r, residuals

. reg tenure experience

. predict tenure_r, residuals

. binscatter wage_r tenure_r

The procedure is built into binscatter:

. binscatter wage tenure, controls(experience)

Michael Stepner binscatter

Page 64: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

. binscatter wage tenure, controls(experience)

56

78

91

0w

ag

e

0 5 10 15 20tenure

Coef = 0.19 (0.02)

Page 65: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

. binscatter wage tenure, controls(experience)

56

78

91

0w

ag

e

−5 0 5 10 15tenure

Coef = 0.04 (0.03)

Page 66: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

by-variables

Michael Stepner binscatter

Page 67: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Plotting multiple series using a by-variable

binscatter will plot a separate series for each group

each by-value has its own scatterpoints and regression line

the by-values share a common set of binsI constructed from the unconditional quantiles of the x-variable

Michael Stepner binscatter

Page 68: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

. binscatter wage age, by(race)

46

81

0w

ag

e

34 36 38 40 42 44age

race=white race=black

Page 69: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

. binscatter wage age, by(race) absorb(occupation)

46

81

0w

ag

e

34 36 38 40 42 44age

race=white race=black

Page 70: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RD and RK designs

Michael Stepner binscatter

Page 71: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RD and RK designs

Binned scatterplots are very useful for illustratingregression discontinuities (RD) or regression kinks (RK)

Consider a wage schedule where the first 3 yearsare probationary

After 3 years, receive a salary bump

After 3 years, steady increase in salary for each additional year

Michael Stepner binscatter

Page 72: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RD design

. binscatter wage tenure, line(none)

78

91

01

1w

ag

e

0 2 4 6 8 10 12tenure

Michael Stepner binscatter

Page 73: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RD design

. binscatter wage tenure, discrete line(none)

78

91

01

1w

ag

e

0 2 4 6 8 10 12tenure

Michael Stepner binscatter

Page 74: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RD design

. binscatter wage tenure, discrete rd(2.5)

78

91

01

1w

ag

e

0 2 4 6 8 10 12tenure

Michael Stepner binscatter

Page 75: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RK design

The firm decides to cap the wage schedule after15 years of tenure

No more salary increases past 15 years

Michael Stepner binscatter

Page 76: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RK design

. binscatter wage tenure, line(none)

78

91

01

11

2w

ag

e

0 5 10 15 20 25tenure

Michael Stepner binscatter

Page 77: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RK design

. binscatter wage tenure, discrete line(none)

78

91

01

11

2w

ag

e

0 5 10 15 20 25tenure

Michael Stepner binscatter

Page 78: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RK design

. binscatter wage tenure, discrete rd(2.5 14.5)

78

91

01

11

2w

ag

e

0 5 10 15 20 25tenure

Michael Stepner binscatter

Page 79: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

RD and RK designs

Important caution:

The rd() option in binscatter only affects the regression lines

I It does not affect the binning procedure

I A bin could contain observations on both sides of thediscontinuity, and average them together

Implications:

Doesn’t matter with discrete x-variable and option discreteI No binning is performed, each x-value is its own bin

With continuous x-variable, need to manually create binsI Use xq() to specify variable with correctly constructed bins

I A future version of binscatter respect RDs when binning

Michael Stepner binscatter

Page 80: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Event Studies

Michael Stepner binscatter

Page 81: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Event Studies

binscatter makes it easy to create event study plots.

Suppose we have a panel of people, with yearly observations oftheir wage and employer:

We observe when people change employers

For each person with a job switch

Define year 0 as the year they start a new job

I So year -1 is the year before a job switch

I Year 1 is the year after a job switch

Michael Stepner binscatter

Page 82: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

. binscatter wage eventyear, line(connect) xline(-0.5)

77

.58

8.5

9w

ag

e

−4 −3 −2 −1 0 1 2 3 4Years Since Job Switch

Page 83: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Event Studies

Now suppose we also know whether they were laid off at theirprevious job.

I Does the wage experience of people who are laid off differfrom those who quit voluntarily?

Michael Stepner binscatter

Page 84: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

. binscatter wage eventyear, line(connect) xline(-0.5)

> by(layoff)

78

91

01

1w

ag

e

−4 −3 −2 −1 0 1 2 3 4Years Since Job Switch

layoff=0 layoff=1

Page 85: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Final Remarks

Michael Stepner binscatter

Page 86: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Final Remarks

binscatter is optimized to run quickly and efficiently in largedatasets

It can be installed from the Stata SSC repository

I ssc install binscatter

These slides and other documentation is posted on thebinscatter website:

www.michaelstepner.com/binscatter

Michael Stepner binscatter

Page 87: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

References

Michael Stepner binscatter

Page 88: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

Examples of binscatter used in research

Chetty, Raj, John N Friedman, and Emmanuel Saez. 2013. “UsingDifferences in Knowledge Across Neighborhoods to Uncover the Impacts ofthe EITC on Earnings.” American Economic Review, 103 (7): 2683–2721.

Chetty, Raj, John N. Friedman, and Jonah Rockoff. 2014. “Measuring theImpacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates.”American Economic Review, forthcoming.

Chetty, Raj, John N. Friedman, and Jonah Rockoff. 2014. “Measuring theImpacts of Teachers II: Teacher Value-Added and Student Outcomes inAdulthood.” American Economic Review, forthcoming.

Chetty, Raj, John N. Friedman, Soren Leth-Petersen, Torben Nielsen, andTore Olsen. 2013. “Active vs. Passive Decisions and Crowdout inRetirement Savings Accounts: Evidence from Denmark.” Quarterly Journalof Economics, forthcoming.

Michael Stepner binscatter

Page 89: binscatter: Binned Scatterplots in Stata - Michael Stepner ·  · 2018-03-08binscatter: Binned Scatterplots in Stata Michael Stepner MIT August 1, 2014 Michael Stepner binscatter.

References for this talk

Angrist, Joshua D. and Jorn-Steffen Pischke. 2008. Mostly HarmlessEconometrics: An Empiricist’s Companion, Princeton, NJ: PrincetonUniversity Press.

Anscombe, F. J. 1973. “Graphs in Statistical Analysis.” The AmericanStatistician, 27 (1): 17.

Chetty, Raj. 2012. “Econ 2450a: Public Economics Lectures.” Lecture Slides,Harvard University. http://www.rajchetty.com/index.php/lecture-videos.

Michael Stepner binscatter