216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall...

30
Source: Stinebrickner and Stinebrickner (2013). 400 500 600 700 800 900 1000 1100 in c o m e in t h o u s a n d s finish no school finish 1 yr finish 3 yrs grad 2.0 GPA grad 3.0 GPA grad 3.75 GPA Figure 2 Discounted Expected Lifetime Earnings, VN(t') 29 / 56

Transcript of 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall...

Page 1: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

216 Journal of Economic Perspectives

Figure 3AThe Basic Column Chart

Source Stinebrickner and Stinebrickner (2013)

400

500

600

700

800

900

1000

1100

inco

me

inth

ousa

nds

finish no school

finish 1 yr

finish 3 yrs

grad 20 GPA

grad 30 GPA

grad 375 GPA

Figure 2 Discounted Expected LifetimeEarnings VN(t)

Figure 3BThe Revised Column Chart

Source Authorrsquos calculations using numbers inferred from text in Stinebrickner and Stinebrickner (2013)

Discounted Expected Lifetime Earnings VN(t)(Income in thousands)

0 200 400 600 800 1000 1200

Finish no school

Finish 1 year

Finish 3 years

Graduate 20 GPA

Graduate 30 GPA

Graduate 375 GPA

29 56

216 Journal of Economic Perspectives

Figure 3AThe Basic Column Chart

Source Stinebrickner and Stinebrickner (2013)

400

500

600

700

800

900

1000

1100

inco

me

inth

ousa

nds

finish no school

finish 1 yr

finish 3 yrs

grad 20 GPA

grad 30 GPA

grad 375 GPA

Figure 2 Discounted Expected LifetimeEarnings VN(t)

Figure 3BThe Revised Column Chart

Source Authorrsquos calculations using numbers inferred from text in Stinebrickner and Stinebrickner (2013)

Discounted Expected Lifetime Earnings VN(t)(Income in thousands)

0 200 400 600 800 1000 1200

Finish no school

Finish 1 year

Finish 3 years

Graduate 20 GPA

Graduate 30 GPA

Graduate 375 GPA

30 56

An Economistrsquos Guide to Visualizing Data 217

The 3D ChartThe 3D ChartFigure 4A uses the now-familiar 3D effect In such graphs the third dimen-Figure 4A uses the now-familiar 3D effect In such graphs the third dimen-

sion does not plot data values but it does add clutter to the chart and worse it sion does not plot data values but it does add clutter to the chart and worse it can distort the information Look at the far-right-hand bar labeled 6 percent No can distort the information Look at the far-right-hand bar labeled 6 percent No point of the column touches the gridline for that value This software toolmdashlike point of the column touches the gridline for that value This software toolmdashlike many othersmdashuses perspective to give depth to the imaginary plane that runs across many othersmdashuses perspective to give depth to the imaginary plane that runs across the top of the column intersecting the gridline But most readers will perceive the the top of the column intersecting the gridline But most readers will perceive the actual value of the column as less than 6 percent Figure 4B shows a redesign cancel actual value of the column as less than 6 percent Figure 4B shows a redesign cancel the 3D treatment and integrate the disconnected legend with the graph Notice that the 3D treatment and integrate the disconnected legend with the graph Notice that inserting the common baselinemdashportrayed in the original by a hovering barely inserting the common baselinemdashportrayed in the original by a hovering barely perceptible thin gray linemdashpermits a more effective comparison among groupsperceptible thin gray linemdashpermits a more effective comparison among groups

The Unbalanced ChartThe source material for Figure 5A originally appeared in an interactive The source material for Figure 5A originally appeared in an interactive

visualization on the Organisation for Economic Co-operation and Development visualization on the Organisation for Economic Co-operation and Development (OECD) website (httpwwwoecdorggenderdataproportionofemployedw(OECD) website (httpwwwoecdorggenderdataproportionofemployedwhoareseniormanagersbysexhtm) a static version was later reproduced in a hoareseniormanagersbysexhtm) a static version was later reproduced in a New York Times Economix blog post (httpeconomixblogsnytimescom20130402Economix blog post (httpeconomixblogsnytimescom20130402comparing-the-worlds-glass-ceilings_r=2)comparing-the-worlds-glass-ceilings_r=2)

Figure 4AA 3D Chart

Source Ottaviano and Peri (2008)

Change in real weekly wages of US-born workers by group 1990-2006

-60

-40

-20

00

20

40

60

80

100

120

Some High School High School Graduate Some College College Graduate

04

-12 -12

113

-54

-13

-30

60

groups

Young (experience below 20 years)

Old (Experience above 20 years)

31 56

218 Journal of Economic Perspectives

Figure 5AAn Unbalanced Chart

0

5

10

15

20

Uni

ted

Stat

es

New

Zea

land

U

nite

d K

ingd

om

Irel

and

Aus

tral

ia

Est

onia

B

elgi

um

Gre

ece

Can

ada

Icel

and

Fran

ce

Ital

y N

ethe

rland

s Fi

nlan

d O

EC

D a

vera

ge

Hun

gary

Sp

ain

Isra

el

Slov

enia

Po

land

C

zech

Rep

ublic

Sw

itzer

land

A

ustr

ia

Port

ugal

N

orw

ay

Slov

ak R

epub

lic

Ger

man

y Sw

eden

Lu

xem

bour

g Tu

rkey

D

enm

ark

Mex

ico

Kor

ea

Women Men

Percentage of Employed Who Are Senior Managers by Sex 2008

Source Author based on OECD (no date) and Rampell (2013)

Figure 4BFlattening a 3D Chart

Change in real weekly wages of US-born workers by group 1990ndash2006(Percent)

04

-12 -12

113

-54

-13-30

60 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

Change in real weekly wages of US-born workers by group 1990ndash2006(Percent)

04

-12 -12

113

-54

-13-30

60 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

32 56

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

33 56

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

34 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

35 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

36 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

37 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 2: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

216 Journal of Economic Perspectives

Figure 3AThe Basic Column Chart

Source Stinebrickner and Stinebrickner (2013)

400

500

600

700

800

900

1000

1100

inco

me

inth

ousa

nds

finish no school

finish 1 yr

finish 3 yrs

grad 20 GPA

grad 30 GPA

grad 375 GPA

Figure 2 Discounted Expected LifetimeEarnings VN(t)

Figure 3BThe Revised Column Chart

Source Authorrsquos calculations using numbers inferred from text in Stinebrickner and Stinebrickner (2013)

Discounted Expected Lifetime Earnings VN(t)(Income in thousands)

0 200 400 600 800 1000 1200

Finish no school

Finish 1 year

Finish 3 years

Graduate 20 GPA

Graduate 30 GPA

Graduate 375 GPA

30 56

An Economistrsquos Guide to Visualizing Data 217

The 3D ChartThe 3D ChartFigure 4A uses the now-familiar 3D effect In such graphs the third dimen-Figure 4A uses the now-familiar 3D effect In such graphs the third dimen-

sion does not plot data values but it does add clutter to the chart and worse it sion does not plot data values but it does add clutter to the chart and worse it can distort the information Look at the far-right-hand bar labeled 6 percent No can distort the information Look at the far-right-hand bar labeled 6 percent No point of the column touches the gridline for that value This software toolmdashlike point of the column touches the gridline for that value This software toolmdashlike many othersmdashuses perspective to give depth to the imaginary plane that runs across many othersmdashuses perspective to give depth to the imaginary plane that runs across the top of the column intersecting the gridline But most readers will perceive the the top of the column intersecting the gridline But most readers will perceive the actual value of the column as less than 6 percent Figure 4B shows a redesign cancel actual value of the column as less than 6 percent Figure 4B shows a redesign cancel the 3D treatment and integrate the disconnected legend with the graph Notice that the 3D treatment and integrate the disconnected legend with the graph Notice that inserting the common baselinemdashportrayed in the original by a hovering barely inserting the common baselinemdashportrayed in the original by a hovering barely perceptible thin gray linemdashpermits a more effective comparison among groupsperceptible thin gray linemdashpermits a more effective comparison among groups

The Unbalanced ChartThe source material for Figure 5A originally appeared in an interactive The source material for Figure 5A originally appeared in an interactive

visualization on the Organisation for Economic Co-operation and Development visualization on the Organisation for Economic Co-operation and Development (OECD) website (httpwwwoecdorggenderdataproportionofemployedw(OECD) website (httpwwwoecdorggenderdataproportionofemployedwhoareseniormanagersbysexhtm) a static version was later reproduced in a hoareseniormanagersbysexhtm) a static version was later reproduced in a New York Times Economix blog post (httpeconomixblogsnytimescom20130402Economix blog post (httpeconomixblogsnytimescom20130402comparing-the-worlds-glass-ceilings_r=2)comparing-the-worlds-glass-ceilings_r=2)

Figure 4AA 3D Chart

Source Ottaviano and Peri (2008)

Change in real weekly wages of US-born workers by group 1990-2006

-60

-40

-20

00

20

40

60

80

100

120

Some High School High School Graduate Some College College Graduate

04

-12 -12

113

-54

-13

-30

60

groups

Young (experience below 20 years)

Old (Experience above 20 years)

31 56

218 Journal of Economic Perspectives

Figure 5AAn Unbalanced Chart

0

5

10

15

20

Uni

ted

Stat

es

New

Zea

land

U

nite

d K

ingd

om

Irel

and

Aus

tral

ia

Est

onia

B

elgi

um

Gre

ece

Can

ada

Icel

and

Fran

ce

Ital

y N

ethe

rland

s Fi

nlan

d O

EC

D a

vera

ge

Hun

gary

Sp

ain

Isra

el

Slov

enia

Po

land

C

zech

Rep

ublic

Sw

itzer

land

A

ustr

ia

Port

ugal

N

orw

ay

Slov

ak R

epub

lic

Ger

man

y Sw

eden

Lu

xem

bour

g Tu

rkey

D

enm

ark

Mex

ico

Kor

ea

Women Men

Percentage of Employed Who Are Senior Managers by Sex 2008

Source Author based on OECD (no date) and Rampell (2013)

Figure 4BFlattening a 3D Chart

Change in real weekly wages of US-born workers by group 1990ndash2006(Percent)

04

-12 -12

113

-54

-13-30

60 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

Change in real weekly wages of US-born workers by group 1990ndash2006(Percent)

04

-12 -12

113

-54

-13-30

60 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

32 56

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

33 56

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

34 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

35 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

36 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

37 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 3: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

An Economistrsquos Guide to Visualizing Data 217

The 3D ChartThe 3D ChartFigure 4A uses the now-familiar 3D effect In such graphs the third dimen-Figure 4A uses the now-familiar 3D effect In such graphs the third dimen-

sion does not plot data values but it does add clutter to the chart and worse it sion does not plot data values but it does add clutter to the chart and worse it can distort the information Look at the far-right-hand bar labeled 6 percent No can distort the information Look at the far-right-hand bar labeled 6 percent No point of the column touches the gridline for that value This software toolmdashlike point of the column touches the gridline for that value This software toolmdashlike many othersmdashuses perspective to give depth to the imaginary plane that runs across many othersmdashuses perspective to give depth to the imaginary plane that runs across the top of the column intersecting the gridline But most readers will perceive the the top of the column intersecting the gridline But most readers will perceive the actual value of the column as less than 6 percent Figure 4B shows a redesign cancel actual value of the column as less than 6 percent Figure 4B shows a redesign cancel the 3D treatment and integrate the disconnected legend with the graph Notice that the 3D treatment and integrate the disconnected legend with the graph Notice that inserting the common baselinemdashportrayed in the original by a hovering barely inserting the common baselinemdashportrayed in the original by a hovering barely perceptible thin gray linemdashpermits a more effective comparison among groupsperceptible thin gray linemdashpermits a more effective comparison among groups

The Unbalanced ChartThe source material for Figure 5A originally appeared in an interactive The source material for Figure 5A originally appeared in an interactive

visualization on the Organisation for Economic Co-operation and Development visualization on the Organisation for Economic Co-operation and Development (OECD) website (httpwwwoecdorggenderdataproportionofemployedw(OECD) website (httpwwwoecdorggenderdataproportionofemployedwhoareseniormanagersbysexhtm) a static version was later reproduced in a hoareseniormanagersbysexhtm) a static version was later reproduced in a New York Times Economix blog post (httpeconomixblogsnytimescom20130402Economix blog post (httpeconomixblogsnytimescom20130402comparing-the-worlds-glass-ceilings_r=2)comparing-the-worlds-glass-ceilings_r=2)

Figure 4AA 3D Chart

Source Ottaviano and Peri (2008)

Change in real weekly wages of US-born workers by group 1990-2006

-60

-40

-20

00

20

40

60

80

100

120

Some High School High School Graduate Some College College Graduate

04

-12 -12

113

-54

-13

-30

60

groups

Young (experience below 20 years)

Old (Experience above 20 years)

31 56

218 Journal of Economic Perspectives

Figure 5AAn Unbalanced Chart

0

5

10

15

20

Uni

ted

Stat

es

New

Zea

land

U

nite

d K

ingd

om

Irel

and

Aus

tral

ia

Est

onia

B

elgi

um

Gre

ece

Can

ada

Icel

and

Fran

ce

Ital

y N

ethe

rland

s Fi

nlan

d O

EC

D a

vera

ge

Hun

gary

Sp

ain

Isra

el

Slov

enia

Po

land

C

zech

Rep

ublic

Sw

itzer

land

A

ustr

ia

Port

ugal

N

orw

ay

Slov

ak R

epub

lic

Ger

man

y Sw

eden

Lu

xem

bour

g Tu

rkey

D

enm

ark

Mex

ico

Kor

ea

Women Men

Percentage of Employed Who Are Senior Managers by Sex 2008

Source Author based on OECD (no date) and Rampell (2013)

Figure 4BFlattening a 3D Chart

Change in real weekly wages of US-born workers by group 1990ndash2006(Percent)

04

-12 -12

113

-54

-13-30

60 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

Change in real weekly wages of US-born workers by group 1990ndash2006(Percent)

04

-12 -12

113

-54

-13-30

60 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

32 56

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

33 56

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

34 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

35 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

36 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

37 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 4: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

218 Journal of Economic Perspectives

Figure 5AAn Unbalanced Chart

0

5

10

15

20

Uni

ted

Stat

es

New

Zea

land

U

nite

d K

ingd

om

Irel

and

Aus

tral

ia

Est

onia

B

elgi

um

Gre

ece

Can

ada

Icel

and

Fran

ce

Ital

y N

ethe

rland

s Fi

nlan

d O

EC

D a

vera

ge

Hun

gary

Sp

ain

Isra

el

Slov

enia

Po

land

C

zech

Rep

ublic

Sw

itzer

land

A

ustr

ia

Port

ugal

N

orw

ay

Slov

ak R

epub

lic

Ger

man

y Sw

eden

Lu

xem

bour

g Tu

rkey

D

enm

ark

Mex

ico

Kor

ea

Women Men

Percentage of Employed Who Are Senior Managers by Sex 2008

Source Author based on OECD (no date) and Rampell (2013)

Figure 4BFlattening a 3D Chart

Change in real weekly wages of US-born workers by group 1990ndash2006(Percent)

04

-12 -12

113

-54

-13-30

60 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

Change in real weekly wages of US-born workers by group 1990ndash2006(Percent)

04

-12 -12

113

-54

-13-30

60 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

32 56

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

33 56

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

34 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

35 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

36 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

37 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 5: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

33 56

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

34 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

35 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

36 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

37 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 6: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

An Economistrsquos Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source Social Security Advisory Board (2012)

27 Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975-2010

0

5

10

15

20

25

30

35

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of DisabilitymdashCalendar Years 1975ndash2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

34 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

35 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

36 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

37 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 7: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

35 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

36 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

37 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 8: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

36 56

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

37 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 9: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

An Economistrsquos Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the column chart layout (recall Figure 3 also see Schwabish 2013c) Also notice the (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis the usefulness of the y-axis is doubtful with data labels placed on top of each columndata labels placed on top of each column

Figure 9ATwo Pie Charts for Comparison

Aggregate income by source

1962

Other16

Governmentemployeepensions

6

Assetincome15 Earnings

28

SocialSecurity30

Privatepensions

3

2007

Other3

Governmentemployeepensions

8Private

pensions9

Assetincome16

Earnings29

SocialSecurity36

Shares of Aggregate Income 1962 and 2007

Source Social Security Administration (2009)

Figure 9BAlternative to a Pie Chart A Paired Column Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

37 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 10: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

38 56

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 11: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

226 Journal of Economic Perspectives

Alternatively the Alternatively the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent while also highlighting various groups and that the groups sum to 100 percent while also highlighting differences from one year to the other Finally the differences from one year to the other Finally the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes Slope charts can be used for a variety of purposes including on two vertical axes Slope charts can be used for a variety of purposes including showing correlations for example the relationship between a statersquos obesity rate showing correlations for example the relationship between a statersquos obesity rate and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this and the share of people with at least a bachelorrsquos degree (Cairo 2013) In this example the color contrast (or what appears as different shades of grey in the example the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue darker) and those that declined (orange lighter)(blue darker) and those that declined (orange lighter)

Figure 9CAlternative to a Pie Chart A Stacked Bar Chart

Shares of Aggregate Income 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart The Slope Chart

Shares of Aggregate Income 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security 30 Earnings 28

Private pensions 3

Other 18Asset income 15

Government employeepensions 6

39 56

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 12: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

A data set in pictures Chetty et alJAMA 2016

40 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 13: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

women the comparable changes were 023 years in the highestquartile and 010 years in the lowest quartile (P lt 001) These dif-ferences persisted after controlling for the higher growth rate ofincome for individuals in the top quartile relative to the bottomquartile (eTable 4 in the Supplement)

The lower panels of Figure 3 show the annual increase in race-adjusted life expectancy by income ventiles The annual increase inlongevity was 018 years for men (which translates to an increase of234 years from 2001-2014) and 022 years for women (an in-crease of 291 years from 2001-2014) in the top 5 of the incomedistribution In the bottom 5 of the income distribution the av-erage annual increase in longevity was 002 years (an increase of 032years from 2001-2014) for men and 0003 years (an increase of 004years from 2001-2014) for women (P lt 001 for the differences be-tween top and bottom 5 of income distributions for both sexes)

Local Area Variation in Life Expectancy by IncomeLevels of Life Expectancy by Commuting ZoneLife expectancy varied significantly across areas within the UnitedStates especially for low-income individuals Figure 4 shows life ex-pectancy by income ventile for New York New York San FranciscoCalifornia Dallas Texas and Detroit Michigan There was substan-tial variation across these areas for low-income individuals but littlevariation for high-income individuals Life expectancy ranged from723 years to 786 years for men in the lowest income ventile acrossthese 4 cities the corresponding range for men in the top ventilewas 865 years to 875 years

The results in Figure 4 are representative of the variation acrosscommuting zones more generally The SD of life expectancy acrossall commuting zones (weighted by population) was 139 years formen in the bottom income quartile vs 070 years in the top incomequartile (P lt 001) Life expectancy varied less across areas forwomen than men in the bottom income quartile and the amount

of variation across commuting zones also declined with income forwomen (eTable 5 in the Supplement)

Figure 5 shows maps of expected age at death by commutingzone for men and women in the bottom and top quartiles of the na-tional income distribution (maps for the middle-income quartiles ap-pear in eFigure 10 in the Supplement) For individuals in the bot-tom income quartile life expectancy differed by about 5 years formen and 4 years for women between the lowest and highest lon-gevity commuting zones (P lt 001 for both sexes) A summary ofstandard errors by commuting zone appears in part VC of theeAppendix and in eFigure 11

Nevada Indiana and Oklahoma had the lowest life expectan-cies (lt779 years) when men and women in the bottom income quar-tile were averaged Of the 10 states with the lowest levels of life ex-pectancy for individuals in the bottom income quartile 8 formed ageographic belt from Michigan to Kansas (Michigan Ohio IndianaKentucky Tennessee Arkansas Oklahoma Kansas) The states withthe highest life expectancies for individuals in the bottom incomequartile (gt806 years) were California New York and Vermont Lifeexpectancy in the South was similar to the national mean for bothsexes (minus022 years [P = 47] for women and minus096 years [P = 03]for men) in the bottom income quartile Individuals in the top in-come quartile had the lowest life expectancies (lt853 years) inNevada Hawaii and Oklahoma Individuals in the top income quar-tile had the highest life expectancies (gt876 years) in Utah Wash-ington DC and Vermont

Table 1 lists the top 10 and bottom 10 commuting zones in meanlife expectancy (averaging men and women) among the 100 mostpopulated commuting zones for individuals in the bottom and topincome quartiles The expected age at death for the bottom quar-tile ranged from 742 years for men and 807 years for women in GaryIndiana to 795 years for men and 840 years for women in New YorkNew York The commuting zones with the highest life expectancies

Figure 2 Race- and Ethnicity-Adjusted Life Expectancy for 40-Year-Olds by Household Income Percentile 2001-2014

90

85

80

75

70

0 100

19 million

20 million

80

112

119

60

71

77

40

45

50

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Percentile

Mean household income in thousands $ a

WomenMen

20

24

26

Women

Men

Expected age at death y

Bottom 1 788 (95 CI 787-789)Women by household income percentile

Top 1 889 (95 CI 887-891)

Bottom 1 727 (95 CI 726-729)Men by household income percentile

Top 1 873 (95 CI 872-875)

Life expectancies were calculated using survival curves analogous to those inFigure 1 The vertical height of each bar depicts the 95 confidence intervalThe difference between expected age at death in the top and bottom incomepercentiles is 101 years (95 CI 99-103 years) for women and 146 years(95 CI 144-148 years) for men To control for differences in life expectanciesacross racial and ethnic groups race and ethnicity adjustments were calculated

using data from the National Longitudinal Mortality Survey and estimates werereweighted so that each income percentile bin has the same fraction of blackHispanic and Asian adultsa Averaged across years and ages The data are in thousands unless otherwise

indicated

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E5

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

41 56

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 14: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

A data set in pictures Chetty et alJAMA 2016 as rendered in NYT

42 56

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 15: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

A data set in pictures Chetty et alJAMA 2016

Copyright 2016 American Medical Association All rights reserved

were clustered in California (6 of the top 10) whereas the commut-ing zones with the lowest life expectancies were clustered in the in-dustrial Midwest (5 of the bottom 10) The commuting zones withthe highest life expectancies for those in the bottom income quar-tile also had the smallest gaps in life expectancy between the topand bottom quartiles (r = minus082 P lt 001) The expected age at deathfor the top income quartile ranged from 828 years for men and 853years for women in Las Vegas Nevada to 866 years for men and890 years for women in Salt Lake City Utah The areas with the high-est and lowest life expectancies for those in the top income quar-tile were less clustered geographically for example California hadcommuting zones in both the top 10 and bottom 10 of the list

The differences in life expectancy across commuting zones weresimilar in analyses with income measures adjusted for cost of liv-ing with controls for differences across areas in the income distri-bution within each quartile and using measures of loss in life yearsup to the age of 77 years that did not make use of extrapolations be-yond observed ages (part IVC of the eAppendix and eTable 6 in theSupplement) There was also considerable variation in life expec-tancy across counties within commuting zones (part V of theeAppendix eFigure 12 and eTable 7)

Trends in Life ExpectancySimilar to levels of life expectancy temporal trends variedsignificantly across geographic areas Figure 6 maps the annualchange in life expectancy between 2001 and 2014 by state for menand women in the bottom income quartile Hawaii Maine andMassachusetts had the largest gains in life expectancy (gaining gt019years annually) when men and women in the bottom income quar-tile were averaged The states in which low-income individuals ex-perienced the largest losses in life expectancy (losing gt009 yearsannually) were Alaska Iowa and Wyoming

Table 2 lists the top 10 and bottom 10 commuting zones interms of trends in life expectancy (when averaging men and wom-en) among the 100 most populated commuting zones for individu-als in the bottom and top income quartiles The estimated trends

for individuals in the bottom income quartile ranged from anannual gain of 038 years in Toms River New Jersey to an annualloss of 017 years in Tampa Florida Gaps in life expectancybetween the bottom and top income quartiles generally declinedor remained stable in areas in which the bottom income quartileexperienced the largest gains in life expectancy such as Toms RiverNew Jersey In contrast gaps in life expectancy between the topand bottom income quartiles increased by approximately 03 yearsannually in places such as Tampa Florida

Figure 7 shows race- and ethnicity-adjusted life expectanciesby year for men and women in the bottom income quartile in 2 com-muting zones in the top 10 (Birmingham Alabama and CincinnatiOhio) and 2 commuting zones in the bottom 10 (Knoxville Tennes-see and Tampa Florida) This Figure shows that trends in life ex-pectancy across these areas diverged continuously throughout the2000s For example life expectancy increased by approximately 32years from 2001 through 2014 for men and women in CincinnatiOhio but declined by approximately 22 years in Tampa Florida

Correlates of Local Area Variation in Life ExpectancyFigure 8 shows correlations of commuting zone-level estimates ofrace- and ethnicity-adjusted life expectancy for the bottom in-come quartile with local area characteristics The correlations are di-vided into 6 groups health behaviors access to health care envi-ronmental factors income inequality and social cohesion local labormarket conditions and other factors Data for men and women arecombined correlations were similar by sex (eTable 8 in the Supple-ment) County-level correlations were also similar (eTable 9)

Health BehaviorsLife expectancy was negatively correlated with rates of smoking(r = minus069 P lt 001) and obesity (r = minus047 P lt 001) and positivelycorrelated with exercise rates (r = 032 P = 004) among individu-als in the bottom income quartile The maps for rates of smokingobesity and exercise among low-income individuals were similarto those for life expectancy (eFigure 13 in the Supplement)

Figure 4 Race- and Ethnicity-Adjusted Life Expectancy by Income Ventile in Selected Commuting Zones 2001-2014

90

85

80

75

700 5

30

10

60

15

101

20

683

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Mean household income in thousands $ a

Men90

85

80

75

700 5

27

10

54

15

95

20

653

Exp

ecte

d A

ge

at D

eath

fo

r 4

0-Y

ear-

Old

s y

Household Income Ventile

Women

Detroit MI

New York NY

San Francisco CA

Dallas TX

Detroit MI

New York NY

San Francisco CA

Dallas TX

Estimates of race- and ethnicity-adjusted expected age at death for 40-year-olds computed by income ventile (5 percentile point bins)a Averaged across years and ages

Association Between Income and Life Expectancy in the United States Special Communication Clinical Review amp Education

jamacom (Reprinted) JAMA Published online April 10 2016 E7

Copyright 2016 American Medical Association All rights reserved

Downloaded From httpjamajamanetworkcom by Suman Ganguli on 04142016

43 56

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 16: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Visualizing a research strategy

Figure 1

Pollution in China and the Huai RiverQinling Mountain Range

Notes The cities shown are the locations of the Disease Surveillance Points Cities north of the solid line were covered by the home heating policy The figure coloring is generated by interpolating PM10 levels at the 12 nearest pollution monitoring stations to create a high resolution grid of pollution throughout China (1 degree latitude cell width) Areas are left in white which are not within acceptable range of a station

44 56

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 17: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Visualizing a research strategy

Figure 2Particulate Matter Levels (PM10) South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging PM10 across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

4060

8010

012

014

016

0

PM10

(4gm

3 )

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

PM10 in South PM10 in North Local Linear Regression

The estimated change in PM10 (and height of the brace) just north of the Huai River is 416 microgm3 and is statistically significant (95 CI 116 716)

45 56

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 18: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Visualinzing a research strategy

Figure 3Life Expectancy South and North of the Huai River Boundary

Notes Each observation (circle) is generated by averaging life expectancy across the Disease Surveillance Point locations within a 1 degree latitude range weighted by the population at each location The size of the circle is in proportion to the total population at DSP locations within the 1 degree latitude range The plotted line reports a local linear regression plot estimated separately on on each side of the Huai River

7075

8085

Life

Expe

ctanc

y(Ye

ars)

-20 -15 -10 -5 0 5 10 15 20Degrees North of the Huai River Boundary

LE in South LE in North Local Linear Regression

The estimated change in Life Expectancy (and height of the brace) just north of the Huai River is -31 years and is statistically significant (95 CI -50 -13)

46 56

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 19: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Tablesbull Use the same principle

bull Show the databull Donrsquot lie about itbull Focus

bull Which translates inbull Report the important numbers not all the coefficientsbull Keep the number of significant digits downbull No vertical linesbull very few horizontal lines donrsquot box results (3 lines are usually

enough)bull in doubt align leftbull Report the statistics that the reader will need not necessary

those that come by defaultbull Sample questions to ask yourself

bull Do you need the R2bull Are there important tests to report (say between equality of

two coefficients)bull You may need the mean in the control group

47 56

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 20: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Table 6 Impact of the reform on MGNREGS projects Evidence from asset survey

All Projects Ongoing All Projects Ongoing

(1) (2) (3) (4)

Treatment 00494 -0210 0309 00271

(0263) (0413) (0239) (0267)

Observations 390 390 385 385Mean in Control 1380 1169 1179 9819

Number foundNumber Registered

Note the unit of observation is a Gram Panchayat (GP) The dependent variables are the

number of projects registered in the public data portal (nreganicin) on May 15 2013 (1) the

number of projects declared as ongoing in nreganicin (2) the number of registered (3) and

ongoing (4) projects found by surveyors in June-July 2013 Out of 5390 projects registered in

nreganicin for the 390 GP of the survey sample a random sample of 3900 projects were

surveyed (10 per GP) The number of projects found in the survey is scaled up using the

number of registered projects divided by the number of sampled projects rate 5 GP (28

projects) could not be surveyed All specifications include district fixed effects

Table 7 Impact of the reform on fake beneficiaries Evidence from matching ofnreganicin job cards with SECC census

All job cards

Intervention period Post intervention

(as of April 2014) July 2012-March

2013

Apr 2013 - March

2014(1) (2) (3)

Treatment 00187 00181 00107

(000741) (000766) (000696)

Observations 3095 2868 2922

Mean in Control 0644 0673 0698

Treatment 00135 00126 00104

(000613) (000764) (000732)

Observations 3093 2836 2906

Mean in Control 0243 0282 0286

Panel A Match Rate for job cards with one member only

Panel B Match Rate for job cards with two members or more

Job cards with at least one working

member

Note The unit of observation is a GP The dependent variable is the fraction of job cards from nreganicin

matched by name with households from the SECC census A job card with two members or more is matched

when at least to members have been matched by name with a census household The nreganicin data was

extracted from the nreganicin server it covers the period from July 2011 to March 2014 Treatment is a

dummy which is equal to one for the blocks selected for the intervention All specifications include district

fixed effects

39

48 56

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 21: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Stargazer

bull In practice in R use the rdquostargazerrdquo package

bull It will create standardized tables can output them for you inhtml latex word ASCII

bull The default is not bad

bull But you can customize it to chose what statistics to includechose what coefficients to include etc

bull httpscranr-projectorgwebpackages

stargazervignettesstargazerpdf

bull httpjakerusscomcheatsheetsstargazerhtml

49 56

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 22: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Default output

Table 1 Results

Dependent variableftvoteshare

(1) (2)fncandidates 0134lowastlowastlowast

(0007)

RESprior 0059lowastlowast

(0024)

Constant 0012 0094lowastlowastlowast

(0010) (0019)

Observations 372 372R2 0532 0016Adjusted R2 0530 0013Residual Std Error (df = 370) 0153 0222F Statistic (df = 1 370) 420093lowastlowastlowast 6056lowastlowast

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt001

1

50 56

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 23: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Improve on it

bull Transparent label names for dependent and independentvariables

bull Choose the coefficients you would like to include

bull No need to include both R squared and adjusted R squared

bull Maybe the need of the dependent variable in the controlgroup rather than the constant

51 56

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 24: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Visualizing regression results

Figure 1 Overview of Bandhan Results

1

52 56

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 25: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Figure 2 Endline 1-At a Glance

2

53 56

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 26: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Tools and resources

bull R is a great visualization tool (especially ggplot Need to pickup a good book to chose colors etc)

bull Yau Flowing data has many tutorials for how to do things inR (httpflowingdatacomcategorytutorials) and a 4weeks mini-course you can take at your own pace

bull R handlesproduces maps as well

bull At this site you can find a Tuftersquos charts in Rhttpmotioninsocialcomtufte

54 56

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 27: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

Referencesbull Edward Tufte Visual Display of Quantitative Information

Graphics Press 2013 (second edition)bull Jonathan Schwabish ldquoAn Economistrsquos guide to visualizing

datardquo Journal of Economic Perspective 2014 vol 28 number1

bull Banerjee Chattopadhyay Duflo Shapiro ldquoThe long termimpact of a graduation program Evidence from West BengalrdquoMIMEO MIT

bull Duflo Esther ldquoSchooling and Labor Market Consequences ofSchool construction in Indonesiardquo American EconomicReview 2001

bull Chetty Raj Michael Stepner Sarah Abraham Shelby LinBenjamin Scuderi Nicholas Turner Augustin Bergeron andDavid CutlerldquoThe association between income and lifeexpectancy in the United States 2001-2014rdquo JAMA 315 no16 (2016) 1750-1766

55 56

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 28: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

References

bull Bui Quoctrung and Neil IrwinldquoThe Rich Live LongerEverywhere For the Poor Geography Matters New YorkTImes April 11 2016

bull Chen Yuyu Avraham Ebenstein Michael Greenstone andHongbin Li ldquoEvidence on the Impact of Sustained Exposureto Air Pollution on Life Expectancy from Chinalsquos Huai RiverPolicyrdquo

bull Lee David S Enrico Moretti and Matthew J Butler ldquoDovoters affect or elect policies Evidence from the US HouserdquoThe Quarterly Journal of Economics (2004) 807-859

bull Klerman Jacob Alex and Caroline Danielson ldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management vol30 no 4 2011 pp 863888

56 56

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 29: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

References

bull Stinebrickner Ralph and Todd StinebricknerldquoAcademicPerformance and College Dropout Using LongitudinalExpectations Data to Estimate a Learning Model WesternUniversity CIBC Working Paper 2013-5 ( July)httpeconomicsuwocacibcworkingpapers_docs

wp2013Stinebrickner_Stinebrickner05pdf 2013

bull Ottaviano Gianmarco I P and Giovanni Peri Immigrationand National Wages Clarifying the Theory and the EmpiricsNBER Working Paper 14188 ( July) 2008

bull Social Security Advisory Board Aspects of Disability DecisionMaking Data and Materials Social Security Administration(February) httpwwwssabgovPublicationsDisabilityGPO_Chartbook_FINAL_06122012pdf 2012

57 56

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56

Page 30: 216 Journal of Economic Perspectives - edX€¦ · Also notice the olum n chart layout (recall Figure 3; also see Schw abish 2013c). A lso notice the ((subjective) decision to omit

References

bull Social Security Administration Fast Facts Figures AboutSocial Security 2009 Social Security Administration ( July)httpwwwssagovpolicydocschartbooksfast_

facts2009fast_facts09pdf 2009

bull Klerman Jacob Alex and Caroline DanielsonldquoTheTransformation of the Supplemental Nutrition AssistanceProgram Journal of Policy Analysis and Management 30(4)863 88 2011

bull Hanson Gordon H 2012 The Rise of Middle KingdomsEmerging Economies in Global Trade Journal of EconomicPerspectives 26(2) 41 64

58 56