Geographic Boundaries as Regression...

Geographic Boundaries as RegressionDiscontinuities∗

Luke Keele† Rocıo Titiunik‡

May 3, 2014

Abstract

Political scientists often turn to natural experiments to draw causal inferences withobservational data. Recently, the regression discontinuity design (RD) has become apopular type of natural experiment due to its relatively weak assumptions. We studya special type of regression discontinuity design where the discontinuity in treatmentassignment is geographic. In this design, which we call the Geographic RegressionDiscontinuity (GRD) design, a geographic or administrative boundary splits units intotreated and control areas, and analysts make the case that the division into treatedand control areas occurs in an as-if random fashion. We show how this design isequivalent to a standard RD with two running variables, but we also clarify severalmethodological differences that arise in geographical contexts. We also offer a methodfor estimation for geographically-located treatment effects that can also be used tovalidate the identification assumptions using observable pretreatment characteristics.We illustrate our methodological framework with a re-examination of the effects ofpolitical advertisements on voter turnout during a presidential campaign, exploitingthe exogenous variation in the volume of presidential ads that is created by mediamarket boundaries.

∗Authors are in alphabetical order. We thank the Associate Editor Betsy Sinclair, two anonymousreferees, Lisa Blaydes, Matias Cattaneo, Don Green, Justin Grimmer, Danny Hidalgo, Simon Jackman,Marc Meredith, Clayton Nall, Ellie Powell, Randy Stevenson, Wendy Tam Cho, Jonathan Wand, TeppeiYamamoto and seminar participants at the University of Michigan, Stanford University, Yale University,Duke University, the London School of Hygiene and Tropical Medicine, and Penn State University forvaluable comments and discussion. Titiunik gratefully acknowledges financial support from the NationalScience Foundation (SES 1357561). An earlier version of this paper was the winner of a 2010 AtlanticCausal Inference Conference Thomas R. Ten Have Citation for “exceptionally creative or skillful research oncausal inference.” Parts of this manuscript were previously circulated in a working paper entitled “Geographyas a Causal Variable”.†Associate Professor, Department of Political Science, 211 Pond Lab, Penn State University, University

Park, PA 16802 Phone: 814-863-1592, Email: [email protected]‡Assistant Professor, Department of Political Science, P.O. Box 1248, University of Michigan, Ann Arbor,

MI 48106 Phone: 734-936-2939, Email: [email protected]

1 Introduction

Selection and endogeneity are often key threats to inference in the social sciences. Recently,

analysts have turned to natural experiments and quasi-experimental methods as one way

to overcome these obstacles in observational studies—studies where the assignment of the

treatment of interest is not under the researcher’s control. Among these techniques, the

regression discontinuity (RD) design has been revived with great fanfare. Lee and Lemieux

(2010, p.282) summarize the promise that surrounds this design, attributing the recent wave

of RD studies to “the belief that the RD design is not ‘just another’ evaluation strategy

and that causal inferences from RD designs are potentially more credible than those from

typical ‘natural experiment’ strategies.” Moreover, analysts have used RD designs to recover

experimental benchmarks, which has only bolstered their credibility (Green et al. 2009; Cook

et al. 2008). The use of RD designs has exploded recently. Lee and Lemieux (2010) count

78 applications of RD designs in economics, and the design is spreading quickly in political

science (see, e.g., Broockman 2009; Eggers and Hainmueller 2009; Hopkins and Gerber 2009;

Caughey and Sekhon 2011; Gerber et al. 2011; Trounstine 2011; Eggers et al. 2013).

In this paper, we study what we call the Geographic Regression Discontinuity (GRD)

design, a design in which a geographic or administrative boundary splits units into treated

and control areas and analysts make the case that the division into treated and control areas

occurs in an as-if random fashion. One of the earliest and most famous examples of exploiting

geographic variation to estimate causal effects is the study by Card and Krueger (1994),

who estimated the effect of increasing the minimum wage on employment by comparing fast

food restaurants in New Jersey (where the minimum wage was increased) to restaurants in

adjacent eastern Pennsylvania. In political science, political boundaries are often associated

with variation in key treatments such as national or state institutions. For example, Posner

(2004) used the colonial border between Zambia and Malawi, which was drawn by the British

South African Company and split two different ethnic groups, to study the political salience of

2

cultural cleavages. Research designs based on geographic discontinuities are an increasingly

popular type of natural experiment in political science, and have been recently used to study

a variety of topics, including nation building, governance and ethnic relations in Africa

(Asiwaju 1985; Berger 2009; Laitin 1986; Miguel 2004; Miles 1994; Miles and Rochefort

1991; Posner 2004), media effects in Europe and the U.S. (Krasno and Green 2008; Huber

and Arceneaux 2007; Kern and Hainmueller 2008), local policies in U.S. cities (Gerber et al.

2011), and mobilization and polarization in the American electorate (Middleton and Green

2008; Nall N.d.).

In this article, we clarify the methodological difficulties and opportunities that may arise

in geographic applications of the RD framework. We first show that GRD designs lead to

identification of the local treatment effect at the boundary under a two-dimensional continu-

ity assumption which generalizes the seminal identification assumption in Hahn et al. (2001).

In this regard, the GRD design behaves as any other standard RD design with two scores

or running variables. However, applying the two-dimensional continuity assumption to ge-

ography produces some subtle but important differences. We highlight three in particular.

First, analysts who study GRDs often encounter compound treatments –multiple treatments

that affect the outcome of interest simultaneously– more frequently than those who study

non-geographic RD designs. Second, in a GRD design, different measures of distance from

the cutoffs require different identification assumptions. Third, spatial variation in treatment

effects can be mapped to specific locations, which can be used to detect geographic areas

where the identification assumptions are more (or less) likely to hold. In other words, un-

like non-geographic two-dimensional RDs, individual points on the boundary have a clear

interpretation.

Moreover, any method of inference applied to data from a GRD design must account

for possible spatial correlation. To that end, we propose using nonparametric estimation

methods that are standard practice in the analysis of classical RD designs (see, e.g. Imbens

and Lemieux 2008) to GRD designs. Finally, we devote special attention to the practical

3

applicability of GRD designs, in light of the specially demanding restrictions that are required

when the RD assumptions are applied to a geographic context. We argue that the continuity

assumptions needed for identification will hold less often when applied to geography, because

when discontinuities are geographic agents may sort very precisely around the boundaries

and undermine the validity of the design. To that end, we provide researchers with practical

advice on how to judge the plausibility of the key identifying assumptions. More generally,

we argue that considerable substantive knowledge is needed to credibly exploit geographic

boundaries as RD designs.

We illustrate our methodological framework and practical guidelines with an empirical

application that replicates the research design in Huber and Arceneaux (2007) and Krasno

and Green (2008). Following these previous studies, we use the exogenous variation in

the volume of TV ads that is created by media market boundaries to understand whether

campaign ads increase voter turnout. Using individual-level voter turnout data, we replicate

the previous finding that political ads seem to have no effect on turnout. Applying our

framework to this empirical application allows us to highlight and address important features

of GRD designs, including that media-market boundaries tend to be identical to county

boundaries and that the required assumptions are most likely to hold in non-battleground

states.

The manuscript is organized as follows. In the next section, we introduce the details

of our motivating empirical application. In Section 3, we discuss the related literature on

multidimensional RDs, formally state the identification assumptions and discuss estimation

strategies. In Section 4, we discuss the specific issues that arise in geographic applications of

the two-dimensional RD designs. We present our estimation framework in Section 5, and the

empirical results obtained from applying our GRD framework to the media markets example

in Section 6. In Section 7 we discuss recommendations for practice and conclude. We collect

additional discussions, derivations and analyses in the Appendix.

4

2 A Motivating Example

Presidential campaigns spend millions of dollars in television advertisement to both persuade

voters to chose their candidate and motivate them to turn out to vote on election day. Such

spending on TV ads, however, is not spread evenly across the United States. Candidates

spend heavily in so-called battleground states, states where the outcome of the presidential

election is expected to be close, while spending little or nothing in states that clearly favor one

candidate. Campaigns generally buy television advertisement by designated market areas

(DMAs), also known as “media markets.” DMAs are designated by Nielsen Media Research

for the purposes of measuring television ratings. Each DMA is an “exclusive geographic

area of counties in which the home market television stations hold a dominance of total

hours viewed.”1 A single DMA may include counties in more than one state, since residents

in a county can watch TV stations located in a neighboring state’s metropolitan area. For

example, the Chicago DMA includes not only counties in Illinois but also counties in Indiana.

There are a total of 210 DMAs in the United States, typically named after the city (or cities)

where the most viewed TV stations are located.

One prominent example of within-state media market variation, and one that we re-

analyze below, is in New Jersey. Southern New Jersey has been designated as part of the

Philadelphia DMA or media market. Northern New Jersey, however, is designated as part

of the New York City DMA. This variation leads to very different experiences for residents

of New Jersey during presidential campaigns. For example, during the 2008 presidential

campaign, we calculated that residents of New Jersey who were in the Philadelphia media

market were exposed to an average of 177 presidential campaign ads per day from September

1st until election day. By contrast, residents of New Jersey in the New York media market

saw no presidential ads at all during the same time period (Goldstein et al. 2008).

Huber and Arceneaux (2007) and Krasno and Green (2008) exploited this within-state

1See Nielsen Media’s Research glossary of terms at http://www.nielsenmedia.com/glossary.

5

http://www.nielsenmedia.com/glossary

variation in media markets to study whether presidential campaign advertisement affects

turnout and political attitudes. Both studies compare voters in adjacent counties in the

same state that are in different media markets, where one media market had a high volume

of ads and the other media market had few or no ads. In both studies, the authors find little

evidence that being exposed to presidential campaign ads during the 2000 election increases

voter turnout. While these studies do not use detailed geographic data, the assumption in

both cases is that voters who live near a media market boundary are as-if randomized to

presidential TV ads exposure. In the following sections, we use this empirical application

to illustrate our methodological framework and present general features of the GRD design

that are likely to be encountered by practitioners in different subfields. But before turning

to practical issues, we discuss the connections between our approach and other studies on

RD with multiple scores and outline our methodological framework formally.

3 Geographic Regression Discontinuity Design

3.1 Setup and Notation

In a regression discontinuity design, assignment of a binary treatment, T , is a function of

a known covariate, S, usually referred to as the forcing variable or score. In the sharp RD

design, treatment assignment is a deterministic function of the score, where all units with

score less than the known cutoff are assigned to the control condition (T = 0) and all units

with scores above the cutoff are assigned to the treatment condition (T = 1). The crucial

aspect of the design is that the probability of receiving treatment jumps discontinuously at

the known cutoff, while all other factors related to the outcome vary smoothly.2

In a geographic-based RD design, we compare units in a treated area to units in a control

area, which we denote by At and Ac, respectively. We adopt the potential outcomes frame-

work and assume that unit or individual i has two potential outcomes, Yi1 and Yi0, which

2See Imbens and Lemieux (2008) and Lee and Lemieux (2010) for comprehensive reviews of RD designs.

6

correspond to levels of treatment Ti = 1 and Ti == 0, respectively.3 In this context, Ti = 1

denotes that unit i is within At and Ti = 0 denotes that i is within Ac.4 In our empirical

application, At is an area in the state of New Jersey that is within the Philadelphia media

market, while Ac is a region in New Jersey that is within the New York media market.

Thus, in our example, Ti = 1 when individual i resides in the Philadelphia media market,

and Ti = 0 when he resides in the New York media market. We are interested in the effect

of treatment for unit i, τi = Yi1 − Yi0, where in our application the potential outcomes are

binary and they represent the decision to turn out to vote (or not). The observed outcome is

Yi = TiYi1 + (1− Ti)Yi0, and the fundamental problem of causal inference is that we cannot

observe both Yi1 and Yi0 simultaneously for any given unit, which implies that we cannot

recover the individual effect τi. However, under certain assumptions, we will be able to learn

about local averages of τi.

3.2 Identification

We now discuss the assumptions needed to identify treatment effects in the GRD design.

Informally, a parameter is said to be identifiable if changing the value of the true parameter

that generated the data implies a different distribution of the observed data (see Matzkin

2007, §3.1). Identification for the GRD design holds under a central continuity assumption,

also considered by Imbens and Zajonc (2011) for the general case of RD designs with multiple

forcing variables. An example of an RD with two forcing variables is students taking two

exams (say, language and mathematics) and a rule that allows students to graduate when

their test scores on each exam exceed a particular threshold. As we formally outline below,

the central identification assumption for this two-dimensional RD design is equivalent to the

3Throughout, we assume that the potential outcomes of one unit do not depend on the treatment of otherunits, sometimes called SUTVA, the Stable Unit Treatment Value Assumption (Cox 1958; Rubin 1986). Butwe discuss the issue of interference in Section A.1 in the Appendix and the related issue of compoundtreatments in Section 4.

4Note that we are defining our unit of observation as individuals within geographic areas, which impliesthat the underlying manipulation we are considering is one where individuals are assigned to treated orcontrol areas. An alternative would be to consider a cluster assignment whereby the geographic areasthemselves are assigned to treatment or control. We do not pursue this alternative here, but discuss it morefully in Section A.2 in the Appendix.

7

identification assumption needed for GRD designs. In this regard, GRD designs are a special

case of an RD design with two arbitrary scores and cutoffs. In a GRD design, coordinate

systems like latitude and longitude are analogous to the scores on the math and language

exams.

We now state the two-dimensional continuity assumption for the GRD. We exploit the

spatial proximity to the border between Ac and At, and the fact that the treatment jumps

discontinuously along this boundary. To illustrate, in our empirical example we concentrate

on areas on either side of the boundary the separates the Philadelphia and New York media

markets, where the volume of ads changes discontinuously from very high to zero. We define

a score that uniquely represents unit i’s geographic location, and allows us to compute i’s

distance to any point on the border. We use vectors, in bold, to simplify the notation.

The geographic location of individual i is given by two coordinates such as latitude and

longitude, (Si1, Si2) = Si. We call the set that collects the locations of all boundary points

B, and denote a single point on the boundary by b, with b = (S1, S2) ∈ B. Thus, At and

Ac are the sets that collect, respectively, the locations that receive treatment and control.

The treatment assignment is a deterministic function of the score Si, and can be written

as Ti = T (Si), with T (s) = 1 for s ∈ At and with T (s) = 0 for s ∈ Ac. This assignment

has a discontinuity at the known boundary B. In addition, we assume throughout that

the density of Si, f(s), is positive in a neighborhood of the boundary B—an assumption

that, as we illustrate below, can be restrictive in geographic contexts. This setup, together

with the identification assumption below, formally constitute the Geographic Regression

Discontinuity design.

Assumption 1 (Continuity in two-dimensional score). The conditional regression functions

are continuous in s at all points b on the boundary:

lims→b

E {Yi0|Si = s} = E {Yi0|Si = b}

8

lims→b

E {Yi1|Si = s} = E {Yi1|Si = b} ,

for all b ∈ B.

This assumption requires that the average potential outcomes under treatment and con-

trol be continuous at all points on the boundary. In the context of our example, this means

that the average potential turnout that would be observed under low (high) advertisement

very near point b on the boundary between the New York and Philadelphia media markets,

is very similar to the average potential turnout that would be observed under low (high)

advertisement exactly at this boundary point, regardless of the direction in which we ap-

proach the boundary. In other words, the difference in the average potential turnout with

low/high advertisement between the boundary point b and a point very close to it can be

made arbitrarily small by getting increasingly closer to b.

Note that the probability of treatment jumps discontinuously along an infinite collection

of points – the collection of all points b ∈ B. This implies that the parameter identified under

this assumption is infinite-dimensional, as it is a curve on a plane. In other words, since

the cutoff is not a point but a boundary, the GRD design will identify the treatment effect

at each of the boundary points. This result is summarized in the following proposition,

where superscripts t and c are used to denote locations in the treated and control areas,

respectively, so that, for example, sc ∈ Ac and st ∈ At.

Proposition 1 (Geographic Treatment Effect Curve). If Pr(Ti = 1) = 1 for all i such that

si ∈ At and Pr(Ti = 0) = 1 for all i such that si ∈ Ac (the discontinuity is sharp), and

Assumption 1 holds then

τ(b) ≡ E {Yi1 − Yi0|Si = b}

= limst→b

E{Yi|Si = st

}− lim

sc→bE {Yi|Si = sc} for all b ∈ B.

(The proof follows Hahn et al. (2001); see Section A.3 in the Appendix for details.) In

other words, under the GRD assumptions, we can identify one (possibly different) treatment

9

effect τ(b) for every point b on the boundary, defining a treatment effect curve.5 In addition

to the parameter τ(b) = E {Yi1 − Yi0|Si = b}, researchers may be interested in the average

of effect across all boundary points, τ = E {Yi1 − Yi0|Si ∈ B}. This average parameter could

be obtained by integrating the τ(b) effects over the entire boundary. As discussed by Imbens

and Zajonc (2011), we can write the average effect τ as

τ =

∫s∈B

τ(s)f(s|S ∈ B) ds =

∫s∈B τ(s)f(s) ds∫

s∈B f(s) ds,

which can be easily recovered once the local effects τ(b) and the density f(b) are estimated

at multiple boundary points. This estimate can serve as a useful summary if researchers

wish to report an overall effect instead of, or in addition to, geolocated local estimates.

3.3 Related Literature on RD with Multiple Forcing Variables

There are a number of recent studies that discuss RD designs with multiple forcing variables

that are closely related to our project. Papay et al. (2011) were among the first to consider

the generalization of the RD design to multiple forcing variables. However, this study does

not discuss formal identification results, and considers cases where multiple scores assign

individuals to a range of different treatment conditions, as opposed to a single treatment as

we discuss here.

Dell (2010) uses a geographic discontinuity to study the effect of a forced labor mining

system in Peru and Bolivia (the mita) on household consumption and other measures. Her

approach is related to ours in that it incorporates geographic coordinates into the analysis,

but our approach differs from hers in important ways. First, Dell (2010) discusses identifi-

cation informally, and focuses mostly on estimation issues. In contrast, we focus on formal

identification issues and highlight specific threats that tend to arise in geographic based

designs. Second, her estimation strategy uses a two-dimensional score (latitude and longi-

5In Section A.4 in the Appendix, we provide alternative versions of Assumption 1 and Proposition 1 thatcollapse the two-dimensional score into a scalar measure of geographic distance.

10

tude) that is the same for all units contained in the same cluster: the specifications include

a cubic polynomial in the latitude and longitude of each observation’s district capital, but

the unit of observation is the individual or household, with many individual observations

contained in each district. In contrast, our approach focuses on cases where the individual

observations are directly geo-referenced and individual geographic locations are directly in-

corporated into the analysis. Moreover, with clustered geographic coordinates one cannot

define the individual-level treatment assignment as a deterministic function of this score, as

in the traditional sharp RD, since there are no individual-level scores available, only cluster-

level scores. Another distinction is on the parameter of interest; while Dell (2010) captures

geographic treatment effect heterogeneity with boundary segment fixed effects, we focus on

identification and estimation of treatment effects at every boundary point, which captures

geographic heterogeneity in a more general way, as these effects for specific boundary points

can always be integrated to obtain the effect for any larger segment.

Finally, in an independent study, Imbens and Zajonc (2011) develop identification and

estimation results for the general case of the RD design with multiple forcing variables. This

study is similar to ours in that it discusses identification formally, and it also proposes local

regression estimation. But unlike these authors, who focus on general treatments based on

multidimensional scores, we focus specifically on geographic treatments and the particular

inferential obstacles and opportunities that arise in geographic applications.

We have pointed out the equivalency between the GRD design and the non-geographic,

two-dimensional RD design. But when it comes to defining the treatment assigned, calculat-

ing the distance to the cutoffs, interpreting treatment effects, and evaluating the plausibility

of the identification assumptions, GRD designs lead to very specific issues that do not often

arise in non-geographic contexts. We now turn to these differences.

11

4 Particularities of Geographic Regression Discontinu-

ities

We highlight three areas in which the GRD design differs from the two-dimensional non-

geographic RD design: the possibility of compound treatments, the role of distance to the

boundary, and geographic treatment heterogeneity.

4.1 Compound Treatments

In GRD designs, we often are confronted with compound treatments, that is, a situation

in which two ore more treatments that affect the outcome of interest occur simultaneously.

This poses a serious challenge if the researcher is interested in only one of those treatments

since, absent any restrictions or assumptions, it will not be possible to separate the effect

of the treatment of interest on the outcome from the effect of all other simultaneous “irrel-

evant” treatments. In geographic applications, compound treatments typically arise when

two or more geographically defined borders are located at the sample place. Although this

phenomenon can also occur in standard RD designs (e.g., multiple policies may change when

a person reaches 65 years of age), it is much more common in geographic RD designs, since

the discontinuity of interest is typically the boundary of some administrative unit (a county,

an electoral district, a school district, etc.), and these boundaries often perfectly overlap

with each other– for example, county boundaries often coincide exactly with school district

boundaries.

As illustrated by our empirical application, compound treatments may pose a serious

challenge even when the boundary of interest is seemingly unrelated to administrative bound-

aries. Figure 1 contains a map of media markets in 2008 overlaid onto a map of U.S. counties.

Close examination of the map reveals that, with only a few exceptions, media market bound-

aries exactly follow county boundaries. That is, moving from one media market to another

implies moving from one county to another. In many places, U.S. House and other legislative

12

districts follow county boundaries as well. As a result, county, media market and legislative

district boundaries may all overlap in the same location.

13

Figure 1: Map of U.S. counties overlaid with designated media market boundaries creating compound treatments.Note: Gray lines indicate county borders and black lines indicate media market boundaries. Media market boundaries follow countyborders except in rare instances.

14

Why does this matter? The outcome in our application is voter turnout. In many states,

counties are important units in terms of electoral administration, as county election officials

are often allowed to decide the number of polling stations, set precinct boundaries and enforce

voter identification laws. It is quite possible that turnout might differ at a county boundary

due to the specific features of the counties’ electoral administration. If media market and

county boundaries are identical, one will not be able to isolate the two, possibly different,

effects. Alternatively, if a congressional district border is also shared, a competitive House

race might also confound the media market effect on turnout.

When analysts face compound treatments, an additional assumption will be needed for

identification (see Hernan and VanderWeele 2011; VanderWeele 2009).6 We can state this

assumption formally as follows. We assume there are K binary treatments that occur simul-

taneously, which we denote as Tij, j = 1, 2, . . . , K, for each individual i, and Tij = {0, 1}.

We assume that only the kth treatment, Tik, is of interest. In the most general case, po-

tential outcomes will be affected by each of these simultaneously occurring treatments, and

isolating the effect of Tik will not be possible. We generalize our potential outcomes notation

to illustrate this general case, and let YiTibe the potential outcome of individual i with

Ti = (Ti1, Ti2, . . . , Tik . . . , TiK)′ a K-dimensional vector. This general notation allows all K

versions of treatment to affect the potential outcomes of individual i.

When the boundary of interest is simultaneously the boundary of multiple institutional,

administrative or political units and we wish to make inferences about the effect of only one

of these treatments, it may be appropriate to assume that the treatment of interest is the

only treatment that affects potential outcomes:

Assumption 2 (Compound Treatment Irrelevance). Assume the treatment of interest is

the kth treatment. For each i and for all possible pairs of treatment vectors Ti and T′i,

YiTi= YiT′

iif Tik = T ′ik.

6These authors use the term compound treatment to describe a situation in which there are multipleversions of the same treatment. We use the term slightly differently, to refer to situations in which there areentirely different treatments and all of them simultaneously affect the potential outcomes.

15

When the Compound Treatment Irrelevance assumption holds, the potential outcomes

are only a function of the treatment of interest, so YiTi= YiTik and we can denote potential

outcomes simply as Yi1 and Yi0 corresponding, respectively, to Tik = 1 and Tik = 0. In our

application, there are at least two simultaneous treatments. The first, which we denote Ti1,

is the county treatment, which is all the ways in which county-level electoral administration

may affect voter turnout. But there is a second treatment, which we denote Ti2, that is the

media market, and captures all the ways in which being exposed to high presidential TV

advertising may affect voter turnout.

When the two boundaries overlap exactly, one alternative to make inferences about the

effect of Ti2 alone is to assume that Yi(Ti1,Ti2)′ = YiTi2 . In our example, this implies assuming

that there is no separate county effect on turnout, so that the county treatment can be exactly

reduced to the media market treatment.7 Another alternative is to define the estimand as

a compound treatment effect that includes both a media market effect and a county effect,

but for the purposes of our example this is unsatisfactory, because our substantive interest

is on media market effects isolated from county effects.

In some applications where units can be observed before and after the treatment of

interest occurs and all the “irrelevant” treatments occur in both periods, the differences

between treated and control areas in the first period could be used to infer the effect of the

irrelevant treatments on the outcome. Then, under appropriate assumptions, the irrelevant

effect observed in the first period could be subtracted from the overall effect observed in the

second period—when the treatment of interest occurs simultaneously with all the irrelevant

treatments—to isolate the effect of the treatment of interest on the outcome. Developing the

type of assumptions under which this “differences-in-differences” strategy is valid is beyond

the scope of this article, but we note that in general this strategy will require conditions that

specify the ways in which the treatment of interest and the irrelevant treatments affect the

7Of course, the assumption Yi,(Ti(1),Ti(2)) = Yi,Ti(1) also satisfies Assumption 2, since it allows us to reducethe media market treatment to the county treatment. However, since our substantive interest is on mediamarket effects, this version of Assumption 2 is not helpful.

16

potential outcomes over time.

The ideal scenario is one where the Compound Treatment Irrelevance assumption can be

avoided altogether. In the media market application, we found one place where this holds:

a small area of Northern California along Lake Tahoe, where voters are part of the Reno,

Nevada, DMA. Citizens in this region of California get the full volume of presidential ads

from Nevada (a battleground state), while other California residents do not. In this case, the

Reno media market does not follow a county boundary and instead splits El Dorado county

in California into two media markets: the Reno DMA to the east, and the Sacramento DMA

to the west. Figure 2 displays this area of California.

Since the boundary between the Reno and Sacramento media markets splits El Dorado

county, the GRD estimate will isolate the media market effect from any county effect. The

dots in Figure 2 represent the locations of households with at least one registered voter.

Unfortunately, as the figure shows, the density of the data close to the media market border

is very low: there are almost no observations close to the boundary, which violates one of

the needed assumptions for implementation of a GRD design. In addition, rudimentary

comparisons between the observations on either side of the boundary show that those who

live along Lake Tahoe are wealthier than those who live outside the Reno DMA in El Dorado

county. The sales price of homes in the Reno media market were on average nearly $200,000

higher than for homes in El Dorado county outside of the Reno DMA. Incomes were also

substantially higher for residents in the Reno media market. Thus, we observe a correlation

between income and the higher volume of presidential campaign ads, which will confound

the effect of ads on voter turnout. This is not surprising, given that the treated and control

groups on either side of the border are very far from each other.

Thus, despite the fact that we can isolate the precise media market effect, we do not

pursue the analysis in this area and focus on New Jersey instead. Nonetheless, this area in

California is useful to illustrate a case where the compound treatment irrelevance assumption

is not needed. Unfortunately, avoiding this extra assumption comes at too high a price in

17

Reno/Sacramento Media Market BoundaryNevada-California Border

El Dorado County, CA

Lake Tahoe

$0 6 12 18 243

Miles

Placer County

Alpine County

Amador CountyCalaveras County

Figure 2: Boundary between Reno and Sacramento media markets. The boundarybetween the Reno, NV, media market (located east of the boundary) and the Sacramento,CA, media market (located west of the boundary) splits California’s El Dorado County (yel-low area). As a result, the media market treatment is isolated from the “county treatment.”2008 volume of ads was high in Reno media market and low in Sacramento media market.Dots represent location of households with registered voters.

this example, since there is simply no data density close to the border.

The presence of compound treatments is very common in GRD designs. When compound

treatments occur, analysts will need to either find areas where a single treatment can be

isolated or make a case that the assumption of compound treatment irrelevance holds. As

we discuss in detail below, we are unable to avoid this assumption in the area of New Jersey

that we study.8

8The Compound Treatment Irrelevance assumption is closely connected to the exclusion restriction ininstrumental variables contexts, were researchers must assume that the instrument has an effect on theoutcome only through the treatment of interest but does not affect the outcome directly. Any GRD design

18

4.2 Naive Distance

Often in the GRD design, the score S is defined as the shortest (i.e., perpendicular) distance

to the boundary, and units that are close to the boundary in terms of this distance but on

opposite sides of it are taken as valid counterfactuals for each other. In a GRD design of

this type, the analyst compares all units that are within a fixed distance from the border see

Black (1999) for an example. Here, individual i has distance Si = d if the distance from i’s

location to the point on the boundary that is closest to i is equal to d. While this concept of

distance is well-defined in the context of geography, it makes little sense in the context of the

two-dimensional RD based on test scores. In the two-dimensional non-geographic RD, we

cannot sensibly define a shortest distance to the “boundary” between treated and control,

primarily because this boundary does not in any sense correspond to a pre-existing feature

of the world such as a county, district, DMA, etc. Thus, the possibility of computing this

shortest distance is another key distinction between the two designs.

Moreover, using the perpendicular distance to the boundary as the score may mask

important heterogeneity and may not allow researchers to fully evaluate the plausibility

of the identification assumptions in the GRD design. The problem is that this measure

of distance ignores the spatial nature of geographic locations. We refer to this distance

as “naive,” to distinguish it from the two-dimensional distance we introduce below, which

we term “geographic.” As illustrated in Figure 3, the shortest distance from individual

i’s location to the boundary does not determine the exact location of i in the map, since

two individuals i and j in different locations can both have Si = Sj = d. That is, this

naive distance does not account for distance along the border. As one can see in Figure

3, a naive implementation of the RD design along a geographic boundary that does not

take into account both dimensions would treat individuals i and j in the control area as

is ultimately focused on the effect of some treatment that occurs in the geographic unit, not on the effect ofthe geographic unit itself. In this sense, the geographic units in the analysis (media markets, in our example)may be seen as analogous to an instrument, and Assumption 2 may be seen as analogous to an exclusionrestriction that requires that the only feature of the geographic units that affects the outcome is the presenceor absence of the treatment of interest (political TV ads).

19

equally distant from individual k in the treatment area, when in fact j is much closer to

k than i. This problem will be exacerbated when the boundary is longer; in Figure 3, as

the boundary becomes longer, the distance between control unit i and treated unit k can be

made arbitrarily large even as Si = d remains constant, by moving i along the dotted line.

Treatment area (At)

Control area (Ac)

Boundary

●i

●j

● kd

Figure 3: Failure of one-dimensional distance to identify boundary points

Naturally, a naive strategy cannot recover τ(b) for any b ∈ B, except in special cases such

as when treatment effects are constant at all boundary points, i.e. τ(bp) = τ(bq) for all bp ∈

B,bq ∈ B.9 In contrast, no information is lost when geography is fully exploited in a GRD

approach: once the local effects τ(b) are estimated for all b ∈ B, researchers can compute any

average effect of interest. Moreover, a GRD approach allows researchers to detect possible

discontinuities in predetermined covariates on different points on the boundary, while these

local discontinuities might be masked in naive approach.

9One avenue for future research is to explore the conditions under which a naive design leads to aconsistent estimator of the overall average effect τ , which will depend on how this design is implemented—for example, simple means on either side of the boundary within a narrow band versus local-polynomial onone-dimensional distance normalizing the cutoffs.

20

4.3 Spatial Treatment Effects

We note one final important difference between GRD designs and the two-dimensional non-

geographic RD design. As we noted in Section 3, the effect identified is not a point estimate

but a line of treatment effects along the border that separates the treated and control areas.

In the GRD design this leads to estimated effects that are spatially located, and these

treatment effects can be in principle heterogeneous. In the standard two-dimensional RD

design, this heterogeneity may be difficult to interpret, but in the GRD design, we can map

this heterogeneity to specific geographic locations to observe whether the treatment effect

varies along the geographic border of interest. In other words, a GRD can uncover interesting

patterns of geographic treatment effect heterogeneity that may have, for example, important

policy implications. Analysts should either specify whether they can articulate a pattern

in the treatment effects or treat such heterogeneity as an exploratory analysis. In the next

section, we develop an estimator that is faithful to the spatial nature of the GRD design.

5 Estimation in the GRD Design

We now provide an estimation framework that is well suited to the features of the GRD

design, and can be used to both estimate treatment effects and assess the plausibility of the

continuity assumptions. We generalize the local polynomial regression estimator commonly

used for estimation in one-dimensional RD designs (see Hahn et al. 2001, Porter 2003, and

Imbens and Lemieux 2008 for an overview). Our goal is to estimate a conditional expectation

of the outcomes as a function of the distance to the boundary. This estimate, however, needs

to be faithful to local spatial variation around the discontinuity of interest. In Section A.6 in

the Appendix, we discuss how our method relates to methods in statistical geography, such

as geographically weighted regression and the analysis of spatial autocorrelation.

First we define µ(x) = E(Y |X = x) as the regression function of the observed outcome

of interest Y on some univariate X. Assuming that the first p+ 1 derivatives of µ(X) at the

21

point X = x0 exist, we can approximate µ(x) in a neighborhood of x0 by a Taylor expansion:

µ(x) ≈ µ(x0) + µ1(x0)(x− x0) +µ2(x0)

2(x− x0)2 + . . .+

µp(x0)

p!(x− x0)p,

where µ(x0), µ1(x0), µ

2(x0), . . . , µ(x0)p denote the first (p+ 1)th derivatives of µ(x0).

In local regression estimation, this polynomial is fitted locally, minimizing a weighted

sum of squared residuals. The estimated coefficients β = (β1, β2, . . . , βp)′ are defined as

β = arg minβ

N∑i=1

{Yi −

p∑j=0

βj(Xi − x0)j}2

wi,

with weights wi = 1hK(Xi−x0

h) for a given kernel function K(·) and bandwidth h. This yields

µj(x0) = j!βj as an estimator of µ(x0)j for j = 0, 1, . . . , p. In particular, an estimator for

the conditional expectation of Y given X = x0 is given by µ(x0) = β0. See Fan and Gijbels

(1996) for an extensive discussion of local polynomial estimation.

Using this local polynomial estimator, we borrow the basic estimation approach from RD

designs, which involves estimating the left and right limits of µ(c), denoted µl(c) and µr(c),

respectively, with a local polynomial of degree one, where c is a known cutoff in the score.

The estimation of µl(c) uses only observations to the left of c and, similarly, estimation of

µr(c) uses only observations to the right of the cutoff. For given weights wi and a scalar

score Si, this involves computing the weighted regression of the observed outcome Yi on a

constant and Si − c; the estimated effect is then τ = µr(c)− µl(c).

We modify this standard estimation approach in several ways. For a given point b on the

boundary, we calculate a measure such as the Euclidean distance, which can accommodate

multiple dimensions, between the location Si of unit i and b. For every unit i in the sample,

this distance is defined as fb(Si). Letting

22

µ(b)c ≡ limsc→b

E {Yi0|fb(Si) = fb(sc)}

µ(b)t ≡ limst→b

E{Yi1|fb(Si) = fb(st)

},

we estimate these functions by local linear regression. To do so, we solve

(αcb, βcb) = arg min

αcb,β

cb

∑i∈Ac

{Yi − αcb − βcb(fb(Si)− fb(b))}2wib

(αtb, βtb) = arg min

αtb,β

tb

∑i∈At

{Yi − αtb − βtb(fb(Si)− fb(b))

}2wib,

where

wib =1

hbK

(fb(Si)− fb(b)

hb

)are a set of spatial weights where K(·) represents a kernel weighting function and hb is

the bandwidth at the boundary point b. Since fb(b) = 0, all formulas above simplify

immediately. Given these solutions, the GRD effect is estimated as

τ(b) = µt(b)− µc(b) = αtb − αcb (1)

Conventional confidence intervals are typically based on the standard asymptotic dis-

tribution of the least squares estimator and robust standard errors (see, e.g., Imbens and

Lemieux 2008). These confidence intervals, however, ignore the asymptotic bias of the non-

parametric local polynomial estimator, a simplification that is justified only if the bandwidth

in the estimator is chosen to be small enough (i.e., if one does undersmoothing). The com-

mon bandwidth selection methods include cross-validation and mean-squared error (MSE)

minimization (see, e.g., Imbens and Kalyanaraman 2012), which in general lead to band-

width choices that are too large for conventional confidence intervals to be valid (Calonico,

Cattaneo, and Titiunik 2013c). In order to obtain valid inferences researchers may select a

smaller bandwidth to undersmooth, a procedure that is simple but ad-hoc, as there are no

23

general rules about how much undersmoothing should be done. An automatic, data-driven

alternative is to estimate the asymptotic bias ignored by conventional inference, and correct

the standard errors appropriately to produce robust confidence intervals that are valid even

for large bandwidths, including those selected by MSE minimization (Calonico, Cattaneo,

and Titiunik 2013c). In our application below, we sometimes found that the MSE optimal

bandwidths tended to be large, particularly in sparse areas around a boundary point. We

thus make inferences in two ways: (i) we select a fixed bandwidth, smaller than the MSE

bandwidth, which justifies using conventional confidence intervals, and (ii) we employ ro-

bust confidence intervals with MSE-optimal bandwidth. For implementation, we use the

rdrobust software.10

In practice, since the boundary B is an infinite collection of points, we selected a grid of

G points along the boundary for estimation, b1,b2, . . . ,bG. Under this grid of points, we

defined a series of treatment effects τ(bg) for g = 1, 2, . . . , G. Our estimation procedure thus

produces a collection of G treatment effects that can vary along the boundary that separates

the treatment and control areas, and in fact leads to a treatment effect curve, where each

effect can then be mapped in its specific location, bg. In addition, if researchers are interested

in the average effect along the entire boundary defined above, τ , this effect can be estimated

as τ =∑G

g=1 τ(bg)f(bg)/∑G

g=1 f(bg), and its standard error can be obtained using either

resampling methods or the delta method–see Imbens and Zajonc (2011) for details.

Since our method produces G different estimated treated-control differences, researchers

may face a multiple testing problem when G is large. Even if the true effect is zero, we will

expect to reject the null hypothesis α×G times if we use an α-level test. This multiple testing

problem poses no difficulties since there are well known solutions such as the Bonferroni

correction or false discovery rates (Anderson 2008; Benjamini and Hochberg 1995).

10Software available at https://sites.google.com/a/umich.edu/rdrobust. See Calonico, Cattaneo,and Titiunik (2013b) for details on the STATA implementation, and Calonico, Cattaneo, and Titiunik(2013a) for details on the R implementation.

24

https://sites.google.com/a/umich.edu/rdrobust

5.1 Falsification Tests

Before presenting the results from our application, we briefly discuss how to perform falsifica-

tion tests in the GRD design. It is now established practice in the analysis of RD designs to

provide empirical evidence that speaks to the credibility of the continuity assumption needed

for identification. Different types of evidence can be presented, but the most common is a

series of tests on predetermined or pretreatment covariates – covariates that are determined

before the treatment occurs and for which the treatment effect is zero by construction. We

review two different but related falsification tests for GRD designs.

First, predetermined covariates can be treated as outcomes using the local linear estima-

tor outlined above. Since the effect of the treatment is zero by construction, the estimated

effects should be statistically indistinguishable from zero, ideally with short confidence in-

tervals. In our example, we would hope to find that the estimated effect of presidential

campaigning on housing prices and other covariates at each boundary point is indistinguish-

able from zero. This same procedure can also be used to plot these placebo treatment effects

on a map for an assessment of geographical treatment heterogeneity. Ideally, the researcher

will choose to only study treatment effects at those points where the placebo analysis indi-

cates that pretreatment covariates are indistinguishable across treatment and control areas.

A second common type of falsification test based on pre-treatment covariates is a series of

“balance tests,” tests that investigate whether the mean (or other feature of the distribution)

of the covariates is statistically indistinguishable between treated and control units near the

cutoff. Strictly speaking, covariate balance in a small neighborhood around the geographic

boundary is expected under a local randomization assumption, but not necessarily under a

continuity assumption. See Cattaneo, Frandsen, and Titiunik (2013) for a discussion of the

differences between local randomization and continuity in the general RD design, and Keele

and Titiunik (2013) for a discussion in geographic contexts.

Assesing covariate balance in a GRD design, however, requires a different strategy since

covariate balance is a spatial construct. Under a naive approach, balance testing is straight-

25

forward since we can simply compare averages based on equal distances around the disconti-

nuity. That is, we could look at the average difference in housing prices for all houses within

100 meters from the media market border. We could also increase the distance around the

border to understand if balance changes as a function of distance, repeating the balance test

for all houses within 200 meters, 300 meters, etc. All things being equal, the credibility of

the design would be enhanced if balance improves as we get closer to the boundary.

The key difficulty with this geographically naive form of balance testing is that balance

might change as we move along the border. While balance may hold along one section of the

border, it may not hold along another section. Such geographic heterogeneity will be missed

in a geographically naive balance test. To that end, we developed a method for testing spatial

balance that compares nearest geographic neighbors. As we show later, balance tests based

on spatial proximity can differ substantially from balance tests based on naive distance.

We developed the following algorithm to assess spatial covariate balance:

• For treated unit i, calculate the geographic distance between it and all control units.

• Match unit i to the nearest control unit (or set of control units) in terms of this

geographic distance.

• Break ties randomly, so that each treated unit i is matched to a single control unit.

• Repeat for all treated units.

• Apply standard balance tests such as KS tests or t-tests to the spatially matched data.

A few issues about this form of balance test are worth noting. First, while this balance

test uses a matching algorithm, matching is only applied to distance, so the end result is

a set of spatially proximate of pairs. Second, one can also combine this spatial balance

test with pruning of observations based on overall distance to the border. For example,

we might apply the spatial balance test only to voters that are within 200 meters of the

border. Third, this balance test, like all balance tests, is not a strict hypothesis test based

26

on statistical significance. As with any falsification test, we prefer not just statistically

insignificant estimates, but small differences that translate into both small point estimates

and large p-values.

All in all, whether falsification tests based on covariates lend credibility to the design

will depend on whether the covariates selected are closely connected to the outcome and

the treatment of interest, an issue that will depend on substantive knowledge behind each

particular application. Moreover, falsification tests based on balance should be interpreted

carefully, as there could simultaneously be evidence of continuity at the border and imbalance

near the border for these covariates. In these situations, researchers should pay attention to

the implementation of the estimation strategy, as the credibility of the results will hinge on

the credibility of the extrapolation to the boundary point.

6 Estimating the Effect of Campaign Ads on Voter

Turnout

We now apply our GRD framework to the research design developed by Huber and Arce-

neaux (2007) and Krasno and Green (2008) and examine the effect of political television

advertisements on turnout during the 2008 presidential election, using the variation in the

volume of ads created by different media markets. We first describe the data we used for

the analysis, and then the geographic analytic tools that are essential to study this question

with a GRD design. We then explore the issue of compound treatments in this application,

and demonstrate how to make the assumption more plausible. Next, we explore whether

pretreatment covariates (variables that are determined before the volume of ads) differ sig-

nificantly near the media market border where the discontinuity in advertisement volume

occurs. This is equivalent to the requirement in the standard RD design that pretreatment

covariates be similar or “balanced” close to the cutoff, just as one would expect in a random-

ized experiment (see Lee 2008). The final part of our empirical analysis focuses on estimating

and mapping the turnout effects of presidential campaign advertisement.

27

6.1 Data

Our analysis is greatly aided by the form of our data, in particular the fact that it is individual

level data with enough information to allow for geo-locating all units. We rely on two data

sources. Our main source is the New Jersey voter file. We purchased a version of this file

from the Catalist corporation. This dataset has measures of party registration, gender and

age directly from the voter file, and imputed values of education, income, poverty status,

and employment status.11 Most importantly, the voter file also contains the address of each

voter, which allows us to find each voter’s geographic location and avoid the use of naive

distances. Our second data source is property sales records. Data from housing sales have

a number of advantages. First, according to hedonic pricing theory, housing prices should

reflect a wide variety of neighborhood characteristics, including school quality (Sheppard

1999; Malpezzi 2002). Second, these data are not aggregated, which allows us to precisely

estimate how they vary around the boundary of interest. We acquired records for all houses

sold in the appropriate zip codes in New Jersey from January 2006 to November 2008. In

this time period, nearly 3,000 homes were sold in this area – although we only used the 1,800

house sales inside one specific school district, see below. The housing sales data allow us to

conduct a fine-grained analysis of the sales price differential along the boundary of interest.

We did not use Census data. Census units such as census block groups typically contain

between 600 and 3,000 people. Given the large size of block groups, it is often difficult to tell

whether there is meaningful spatial variation in block group level measures as one approaches

the boundary of interest. In addition, past turnout behavior might seem to be an important

pretreatment covariate to either condition on or use as a placebo outcome in our application.

However, past turnout is affected by similar ad differentials in past elections. As such, past

turnout cannot be considered a pretreatment covariate.

11See Ansolabehere and Hersh (2012) for details about the features of voter files compiled by Catalist.

28

6.2 Geographic Analysis

We use Geographic Information Systems (GIS) software to process the data before the final

statistical analysis. Indeed, we believe that without GIS analysis the GRD design is signifi-

cantly weakened. GIS software allows analysts to more fully exploit geography and spatial

proximity. Here, we outline the geographic analysis we performed to implement the GRD

design in New Jersey.12

We can use certain GIS techniques to avoid the biases caused by aggregate data and

the modifiable areal unit problem (MAUP) (Openshaw 1984). MAUP refers to the fact

that areal units such as census tracts or blocks are often relatively arbitrary with respect

to the spatial variation of the units measured. Thus, aggregate measures do not accurately

reflect individual level phenomena unless those phenomena are spatially constant with respect

to the areal unit. It is well understood that MAUP introduces bias into estimates based

on aggregate units (Openshaw 1984; Wong 2008; Cohn and Jackman 2011; Reardon and

O’Sullivan 2004; Fotheringham and Wong 1991). One solution to MAUP, made possible

by recent advances in GIS, is to eschew aggregate data and give each individual a unique

neighborhood that is centered on the individual and includes the area around that specific

point. Such a measure varies with the location of the individual, which allows the measure

to vary spatially and accurately reflect the composition of the area around the individual. In

our case, since the voter file contains information about the geographic location of individual

voters, we can create and analyze spatial measures that are not subject to the bias caused

by the MAUP.

Next, we detail how we used GIS techniques. First, we geocoded both the voter file and

data on housing sales. Geocoding is the process of converting addresses into a coordinate

system, typically latitude and longitude.13 Geocoding allows us to know the distance between

voters and the media market boundary that forms the discontinuity of interest, and to develop

12We performed all the geographic analysis in ArcgGIS 9.3.13Geocoding involves taking formatted addresses and comparing them to a known database of addresses

and street locations with an assigned geographic reference such as latitude and longitude.

29

a score that reflects the two dimensional geographic space.14 We also used GIS software for

two other tasks. First, we created what is called a buffer around the media market boundary.

The buffer is a spatial object that records which voters fall within a specified distance of a

geographic boundary. We used a buffer to identify which voters are within 100, 200, 300,

400, 500, 600, 700, 800, 900 and 1,000 meters from the border on either side. Second, we

used GIS to obtain a grid of points on the media market boundary for the calculation of

treatment effects. We did this by dividing the boundary into points defined by latitude and

longitude, spaced at selected intervals.

6.3 Compound Treatment Reduction

Before calculating each unit’s distance to each of the boundary points, we discuss how to

minimize the compound treatment assumption in this application. Figure 4 contains a map

of the state of New Jersey along with the location of the boundary between the New York

and Philadelphia media markets. We could calculate distances between voters and points

along the entire media market boundary and compare voters who are near each other along

this border. However, it is first important to reduce the incidence of compound treatments as

much as possible. To do that, we examined the boundaries of four different administrative

units: U.S. congressional districts, state senate districts, state house districts, and school

districts. We found that for many parts of the media market boundary, the boundaries of

at least one of these units overlapped perfectly with the media market boundary. In other

words, in various boundary segments, not only did the media market change at the boundary

but so did the school and/or the legislative districts. This is not entirely surprising since,

as we discussed above, the media market boundary in this area (and in most of the U.S.)

14One might imagine that a simple application of the Euclidean distance with the points defined by latitudeand longitude would be sufficient for calculation of the score in the GRD design. This would be appropriate ifvoters resided on a plane, but the Earth is a sphere. Naive Euclidean distances calculated between geographiclocations can severely overestimate the distance (Banerjee 2005). There are two standard alternatives tothe naive Euclidean distance: the geodetic and chordal distance. We used the chordal distance, which is arescaling of the Euclidean distance and is very close to the geodetic distance for locations that are less than2,000 km apart. The additional advantage of the chordal distance is that it allows for valid calculations ofspatial correlations which the geodetic does not allow for (Banerjee 2005).

30

follows county boundaries.

Area of Detail

New York-Philadelphia Media Market Boundary

PA

NY

Philadelphia

$

0 9 18 27 364.5Miles

NJ

Figure 4: Boundary between Philadelphia and New York City media markets. Thedashed line represents the boundary between the Philadelphia, PA, media market (locatedsouth-west of the boundary) and New York City, NY, media market (located north-east ofthe boundary), which divides the state of New Jersey. Area of detail is where legislativedistricts and school district are constant on both sides of media market boundary. Figure 5contains a detailed map of this area.

The overlap between media and county boundaries means that we cannot escape the

problem of compound treatments entirely, but we can minimize it by restricting our analysis

to those segments along the media market boundary where voters are in identical legislative

districts and school districts. Despite the length of the media market boundary, we found

only one short segment along the border where both legislative district or school district

boundaries did not also follow the county-media market boundary. The area of detail in

31

Figure 4 marks this boundary segment. Figure 5 contains a detailed map of this area. The

area marked with gray hash lines marks is West Windor-Plainsboro school district, which

is split in two by the county-media market boundary. This school district also lies within

a single U.S. House, state house, and state senate district. We restrict our analysis to

residents in this school district since we can more plausibly assume that areas on either side

of this segment of the media market border are comparable – though we test this assumption

empirically below. Thus, while we cannot avoid compound treatments in the application, by

finding an area where school and legislative districts are constant, we hope to increase the

plausibility of the compound treatment irrelevance assumption.

It is also worth noting that by holding units such as legislative and school districts

constant in order to reduce compound treatments, we are making our analysis conditional on

distance. That is, by removing compound treatments, we are already restricting our analysis

to areas within a specific distance of the border. We suspect that, in many applications of

the GRD design, addressing the compound treatment problem will prompt researchers to

indirectly condition on distance to the boundary and will be a useful first step to identify

areas that are similar along the discontinuity of interest.

6.4 Results

Despite the differences between the GRD design and non-geographic RDs, the analysis of

both designs should start in an identical fashion. Under both designs, researchers should

provide evidence about the plausibility of the identification assumption invoked using fal-

sification tests. We now report whether covariates are similar on either side of the media

market boundary segment we selected for analysis using a balance analysis. Given that

we have already conditioned on geographic proximity by selecting a segment of the border

where legislative and school districts overlap, we expect that covariates should already be

comparable.

We check balance on 18 covariates in the voter file and for house prices. Because we

32

analyze a large number of covariates in the voter file, instead of reporting individual-level

balance statistics, we use a global measure of balance developed by Hansen and Bowers

(2008). For housing prices, we report price per square foot to avoid capturing differences

in house sizes. We compare differences in pretreatment covariates in a number of different

ways. We begin with a raw comparison between treated and control areas for all voters and

house sales in the West Windsor-Plainsboro school district. We then wish to understand

whether balance improves depending on whether we use naive distance versus a measure of

distance that accounts for spatial locations.

Table 1 contains the results from the balance analysis. We first discuss the results for

housing prices. The basic treated control difference in housing prices indicates a $22 per

square foot price difference. As expected, given that we have focused on a relatively homo-

geneous area, this difference is not large.15 We then restricted the balance analysis to sales

that were within 1000 meters of the media market boundary. Now the difference is a mere

$3 a square foot. For housing prices, it did not matter whether we conditioned on naive

distance or not; in each case we find that the house price differential along the boundary

is quite small. Given that house prices should be correlated with school quality and other

neighborhood characteristics, the balance we observe helps validate the GRD design.

Next, we examine balance in the voter file covariates. Here, we find that using naive

distance versus geographic distance matters. While balance improves as we near the border,

the imbalance as measured by the global χ2 statistic is nearly half when we use geographic

distance as opposed to naive distance. While the global measure does little to convey in-

dividual characteristics, close examination reveals that registered voters in the treated and

control areas are predominantly white and have similar levels of education and income. Our

analysis reveals that voters along this segment of the media market boundary are well bal-

anced in terms of pretreatment covariates, which lends credibility to the design. We note,

15The average in the New York media market area is $211, and the average in the Phildelphia mediamarket is $233. The standard deviation for the entire area is $37. So this $22 difference is less than astandard deviation.

33

however, that geographic distance does not remove all the imbalances in observed covariates.

This is a common phenomenon in geographic applications, where conditioning on distance

often improves balance but does not eliminate imbalances entirely– see Keele, Titiunik, and

Zubizarreta (2014) for a matching estimator that conditions on covariates and distance si-

multaneously and allows for a sensitivity analysis, and Keele and Titiunik (2013) for an

example with conditioning based on a regression model.

We have seen thus far that we can improve balance considerably by analyzing units that

are spatially proximate. While those results are encouraging, they require us to average

along the entire segment of the media market boundary that we have selected, which is

approximately 7 kilometers long. However, since spatial heterogeneity may occur along

these 7 kilometers, the design (i.e., its identification assumptions) might be more credible

in some parts of the border than others. And even if the design is equally credible along

the entire segment of the border we analyze, the effects of presidential advertisement on

turnout may vary along the border – this is, we might have geographically heterogeneous

treatment effects. We should note, however, that we do not have any strong reason to suspect

heterogeneity. As such, the analysis is exploratory.

We now turn to the results of our local regression estimator introduced in Section 5. We

use this estimation framework to first asses the validity of the design at various points along

the border, and then to estimate the treatment effect of interest and probe its heterogeneity.

For this, we select three different points along the border. The two extreme points are

chosen so that they split the boundary in three equal segments, each of which is roughly 2.3

kilometers long. The third point is the midpoint between the two end points. The distance

between each of the extreme points and the middle point is therefore approximately 1.15

kilometers. For this reason, whenever data density permits, we include results based on a

fixed bandwidth of 1.1 km, which ensures that each observation is used for estimation in

exactly one boundary point.

First, to asses the validity of our design, we apply the local linear estimator to covari-

34

Table 1: Covariate Balance Across New York and Philadelphia Media Markets as a Functionof Distance

Raw 1000m 500m 200m 100mComparison Buffera Buffer Buffer Buffer

Eighteen covariates

Naiveχ2 Statistic (voters)b 18921 3175 1094 217 49

Tr Sample size 15000 1880 731 170 36

Co Sample size 9460 1339 367 49 15

Geographicχ2 Statistic (voters)b – 2660 732 96 29

Tr Sample size – 1339 367 49 15Co Sample size – 1339 367 49 15

Housing Price

NaivePrice Sq. Ft. Diff (houses)c $22 $2.95 $2.93 – –

Tr Sample size 1024 86 27 – –Co Sample size 774 77 14 – –

GeographicPrice Sq. Ft. Diff (houses)c – $3.54 $2.82 – –

Tr Sample size – 86 27 – –Co Sample size – 86 27 – –

Note: aA buffer is a specified distance around the media market boundary. For example, with a 500m buffer

all voters who live more than 500 meters from the city limit are removed from the analysis before matching

on geographic distance occurs. bThe metric is the χ2 test statistic from a global balance test applied to 18

pretreatment covariates; the analysis is performed at the voter level (each individual observation used is a

voter). cPrice Sq. Ft. Diff is the absolute value of the difference in the average house price per square foot;

the analysis is performed at the house level (each individual observation used is a house). Rows labeled

Naive show the unadjusted mean difference between treatment and control areas included in the buffer.

Rows labeled Geographic shows the mean difference between treatment and control areas included in the

buffer after nearest-neighbor matching on chordal (spatial) distance alone.

35

ates other than the outcome to identify segments along the border where the assumption

of continuity of potential outcomes (Assumption 1) seems most plausible. We report the

results for house prices, a covariate that we think is extremely important because it cap-

tures both individual-level and neighborhood-level characteristics that are correlated with

turnout – but we also report results for other covariates such as age and party registration

in the Appendix. Table 2 reports the media market effects on housing prices, estimated in

the three different points using local linear regression. As before, treatment is defined as

being in the Philadelphia media market and control as being in the New York media market.

The Estimate column contains the difference between the limit of the expected outcome for

treated and control units at the boundary point, as defined in Equation 1, estimated with a

fixed bandwidth of 1.7 km (data density was too low to consider a 1.1 km bandwidth). We

report the two types of confidence intervals described in Section 5. In the Conventional In-

ference columns, we calculate conventional 95% confidence intervals for the fixed bandwidth

of 1.7 km. In the Robust Inference columns, we report the robust 95% confidence interval

calculated with a MSE-optimal bandwidth at each point. The point estimates are negative

in the three boundary points, but in all cases the confidence intervals include zero. Thus, we

find no evidence of a significant discontinuity in housing prices at any of the boundary points

we consider. We note, however, that the confidence intervals are wide, which may be due

to the fact that the number of house transactions that occur near the border is relatively

small. Indeed, the data density at the house level is much lower than at the voter level,

which we analyze next. This illustrates an obstacle that is commonly encountered in RD

designs: the number of observations near the cutoff is typically small and leads potentially

to low statistical power to reject the null hypothesis. This challenge may be exacerbated in a

geographic RD design, where researchers require data density in two dimensions as opposed

to one.

Second, in Table 3, we report the effects of presidential ads on voter turnout in 2008

for the same three boundary points. The columns in this table are analogous to those in

36

Table 2: Placebo effect of presidential ads on housing price per sqft

Boundary Estimate Conventional Inference Robust InferencePoint 95% CI h NTr NCo 95% CI h NTr Nco

1 -36.87 [-102.06, 28.32] 1.70 54 73 [-127.18, 46.81] 2.13 74 1022 -48.23 [-105.86, 9.41] 1.70 85 48 [-141.92, 41.75] 1.87 91 563 -7.88 [-67.13, 51.36] 1.70 26 40 [-125.40, 72.04] 1.73 28 42

Note: Results estimated with local linear regression with triangular kernel weights on each observation’s chordal

distance to the point of estimation. Estimate indicates the point estimate (difference in price per square foot)

using the fixed bandwidth h in the conventional inference column; NTr and NCo indicate the effective sample

size used for estimation in the treated and control areas, respectively; h indicates the bandwidth used. All re-

sults estimated with package rdrobust (Calonico, Cattaneo, and Titiunik 2013b,a). In conventional inference,

bandwidth is chosen manually. In robust inference, all bandwidths are chosen with CCT MSE minimization

method described in Calonico, Cattaneo, and Titiunik (2013c). MSE-optimal pilot bandwidths used in robust

confidence intervals are 3.22, 2.59, and 2.55 in boundary points 1, 2 and 3, respectively.

Table 2. The results are fairly consistent across the three boundary points considered. Point

estimates vary between 0.9 (Point 1) and 6.6 percentage points, but in all cases the 95%

conventional confidence intervals are roughly centered around zero and have associated p-

values of roughly 0.80 and above. The robust 95% confidence intervals include zero in every

boundary point as well, although in Point 1 the zero is barely included. This near positive

result, however, is not very stable.16 Taken together, these results suggest that presidential

ads have little effect on voter turnout during the 2008 presidential campaign, consistent with

the national-level findings in Huber and Arceneaux (2007) and Krasno and Green (2008).

In Figure 6, we plot the location of these treatment effects on a map. On the map,

the reader can see the points along the border where we estimated the media market effect

on turnout. The estimates at these points are limits of conditional expectations, estimated

with spatial weights that give stronger weight to voters near the boundary and less weight

to voters farther from the boundary. Given the relative homogeneous nature of this area, we

do not expect much heterogeneity.

16For example, choosing the MSE-optimal bandwidth with the method of Imbens and Kalyanaraman(2012) rather than CCT method described in Calonico, Cattaneo, and Titiunik (2013c), the robust 95%confidence interval is [−0.41, 0.12]

37

Table 3: Effect of presidential ads on 2008 Turnout


1 0.066 [-0.72,0.85] 1.10 574 233 [-0.0006, 0.1490] 6.49 9460 138012 0.009 [-0.27,0.29] 1.10 342 401 [-0.1815, 0.0310] 2.90 3222 39943 0.050 [-0.33,0.43] 1.10 326 324 [-0.0998, 0.2624] 3.36 6163 5075


distance to the point of estimation. Estimate indicates the point estimate (difference turnout proportion across

treated and control areas) using the fixed bandwidth h in the conventional inference column; NTr and NCo

indicate the effective sample size used for estimation in the treated and control areas, respectively; h indicates

the bandwidth used (in km). All results estimated with package rdrobust (Calonico, Cattaneo, and Titiunik

2013b,a). In conventional inference, bandwidth is chosen manually. In robust inference, all bandwidths are cho-

sen with CCT MSE minimization method. MSE-optimal pilot bandwidths used in robust confidence intervals

are 6.15, 4.22 and 3.25 in boundary points 1, 2 and 3, respectively.

The possibility of plotting both actual and placebo effects on a map is one of the dis-

tinctive and, we believe, extremely useful features of the GRD design. This ability to plot

provides researchers with a summary of the geographic heterogeneity in both the treatment

effects and the plausibility of the continuity conditions that are needed for the GRD design

to yield valid inferences.

7 Recommendations for practice and concluding re-

marks

We now summarize our recommendations for analysts who wish to use geographic disconti-

nuities to estimate treatment effects, and offer some concluding remarks about the promise

and limitations of the methodological framework used here.

• Data. Much of the analysis we presented depended on having data that can be geo-

referenced. Without information about geographic locations, boundaries cannot be

fully exploited as discontinuities. Thus, researchers need to collect geographic data

(addresses, latitude and longitude, or other geographic information that can be used

38

for geocoding) along with more traditional covariates. Moreover, qualitative research

on the history of the border and conditions around it will often prove useful to justify

various assumptions.

• Falsification tests. Following standard practice in non-spatial RD designs, the credi-

bility of the design can be enhanced by providing evidence that pretreatment covariates

become more and more similar as the distance to the border decreases. Identification

for the GRD design requires that people cannot precisely sort around the boundary

in a way that makes potential outcomes discontinuous. In many GRD designs, we

expect that people will be able to sort very precisely around the boundary of inter-

est. For example, features such as the quality of schools and the price of housing

may vary discontinuously at the border of interest. Thus, researchers should at least

rule out nonzero treatment effects on predetermined covariates, which can be easily

implemented using covariates as outcomes in the estimation for each boundary point.

• Isolating the treatment. As discussed extensively above, compound treatments are

common in GRD designs. In political science applications, a first step is to restrict

the analysis to areas around the border where other important geographically-defined

institutional units are kept constant on either side of border. This includes units

such as legislative districts, school districts, counties, cities, municipalities, states and

countries. When not all the relevant units are kept constant on either side of the

border, researchers should evaluate whether a plausible Compound Treatment Irrele-

vance assumption can be made. Placebo outcomes will be helpful to assess whether

this assumption is plausible. For example, researchers could show that the outcome of

interest was similar among units on both sides of the border before the treatment was

introduced.

• Appropriate analysis. The statistical analysis for the best GRD designs should be

relatively simple. Analysts should assess balance on pretreatment covariates for groups

39

on either side of the border. If covariates are balanced, a simple comparison of the

outcomes for those on either side of the border will be adequate if the border segment is

short and there is little concern about heterogeneity. On the other hand, if compound

treatments are present or covariates are imbalanced, more complex analysis will be

required. In these cases, placebo maps or tables showing geographically-located effects

of the treatment on predetermined covariates at various points along the border will be

important to identify segments of the border where the GRD’s continuity assumptions

are most plausible. The estimation framework we proposed in Section 5 is well suited

for this purpose.

All in all, we believe that geographic discontinuities are a promising form of natural experi-

ment. The biggest challenge is that often agents are able to sort very precisely around the

boundary that forms the discontinuity in the design. Compound treatments will tend to in-

crease sorting as there will be a larger number of reasons for agents to sort at the boundary.

Moreover, an inability to avoid compound treatments can make defining the exact estimand

difficult. As such, while the GRD design is a particular case of the two-dimensional RD

design, geography creates a number of complications that are not necesarilly common in

non-geographic designs. We have provided a framework that we believe will allow analysts

to address these issues. The implementation of the GRD design requires careful attention

to both the statistical analysis needed to justify the continuity assumption at the center of

the design and the geographic analysis needed to fully assess and exploit the discontinuity.

40


New York Media MarketPhiladelphia Media Market

Treated Area of AnalysisControl Area of Analysis

Montgomery Township School District

Princeton School DistrictHopewell Valley School District

Frankling Township School District

South Brunswick School District

Cranbury Township School District

East Windsor School District

Robbinsville Township School District Milestone Township School District

West Windsor-Plainsboro School District

Upper Freehold School District

Lawrence Township School District

Figure 5: Detail of the boundary between Philadelphia and New York City mediamarkets. Area in gray marks the West Windsor-Plainsboro school district, which straddlesthe media market boundary. Empirical analysis is confined to the West Windsor-Plainsboroschool district only, where legislative districts are also the same on both sides of the border.Treated area is south-west of media market boundary, inside Philadelphia media marketwhere volume of political ads is high; control area is north-east of media market boundary,inside New York City media market, where volume of ads is zero.

41

#

#

#


New York Media MarketPhiladelphia Media Market

Treated Area of Analysis

Control Area of Analysis

West Windsor-Plainsboro School District

Point Estimate: 0.05, p-value: 0.93Point Estimate: 0.01, p-value: 0.94

Point Estimate: 0.05, p-value: 0.86

Figure 6: Geographically-located estimated advertisement effects on 2008 voterturnout. Treatment effects estimated at three different points along the boundary betweenthe Philadelphia, PA, media market (located south-west of the boundary) and New YorkCity, NY, media market (located north-east of the boundary) in the state of New Jersey (seeFigure 4). Results estimated with local linear regression with triangular kernel weights oneach observation’s chordal distance to the point of estimation and bandwidth fixed at 1.1 km.Area marked with gray hash lines is the the West Windsor-Plainsboro school district, whichstraddles the media market boundary. Empirical analysis is confined to the West Windsor-Plainsboro school district only, where legislative districts are also the same on both sides ofthe border. Treated area is south-west of media market boundary, inside Philadelphia mediamarket where volume of political ads is high; control area is north-east of media marketboundary, inside New York City media market, where volume of ads is zero.

42

ReferencesAnderson, Michael L. 2008. “Multiple Inference and Gender Differences in the Effects of Early

Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early TrainingProjects.” Journal of the American Statistical Association 103 (484): 1481–1495.

Ansolabehere, Stephen D. and Eitan Hersh. 2012. “What Big Data Reveal About SurveyMisreporting and the Real Electorate.” Political Analysis 20 (4): 437–459.

Asiwaju, A.I. 1985. Partitioned Africa: Ethnic Relations an Africa’s International Bound-aries, 1884-1984 . London: C. Hurst.

Banerjee, Sudipto. 2005. “On Geodetic Distance Computations in Spatial Modeling.” Bio-metrics 61 (2): 617–625.

Benjamini, Yoav and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: APractical and Powerful Approach to Multiple Testing.” Journal of The Royal StatisticalSociety Series B 57 (1): 289–300.

Berger, Daniel. 2009. “Taxes, Institutions and Local Governance: Evidence from a NaturalExperiment in Colonial Nigeria.”. Unpublished Manuscript.

Berk, Richard A. 2006. Regression Analysis: A Constructive Critique. Thousand Oaks, CA:Sage Publications.

Black, Sandra E. 1999. “Do Better Schools Matter? Parental Valuation of ElementaryEducation.” The Quarterly Journal of Economics 114 (2): 577–599.

Broockman, David E. 2009. “Do Congressional Candidates Have Reverse Coattails? Evi-dence from a Regression Discontinuity Design.” Political Analysis 17 (4): 418–434.

Calonico, Sebastian, Matias Cattaneo, and Rocio Titiunik. 2013a. “rdrobust: An R Packagefor Robust Inference in Regression-Discontinuity Designs.” Unpublished Manuscript.

Calonico, Sebastian, Matias Cattaneo, and Rocio Titiunik. 2013b. “Robust Data-DrivenInference in the Regression-Discontinuity Design.” Unpublished Manuscript.

Calonico, Sebastian, Matias Cattaneo, and Rocio Titiunik. 2013c. “Robust NonparametricConfidence Intervals for Regression-Discontinuity Designs.” Unpublished Manuscript.

Card, David and Alan B. Krueger. 1994. “Minimum Wages and Employment: A Case Studyof the Fast-Food Industry in New Jersey and Pennsylvania.” The American EconomicReview 84 (4): 772–793.

Cattaneo, Matias, Brigham Frandsen, and Rocıo Titiunik. 2013. “Randomization Inferencein the Regression-Discontinuity Design: An Application to Party Advantages in the U.S.Senate.” Unpublished Manuscript.

43

Caughey, Devin and Jasjeet S. Sekhon. 2011. “Elections and the Regression DiscontinuityDesign: Lessons from Close U.S. House Races, 1942-2008.” Political Analysis 19 (4):385–408.

Cohn, Molly J. and Saul P. Jackman. 2011. “A Comparison of Aspatial and Spatial Measuresof Segregation.” Transactions in GIS 14 (1): 47–66.

Cook, Thomas D., William R. Shadish, and Vivian C. Wong. 2008. “Three ConditionsUnder Which Experiments and Observational Studies Produce Comparable Causal Esti-mates: New Findings from Within-Study Comparisons.” Journal of Policy Analysis andManagement 27 (4): 724–750.

Cox, David R. 1958. Planning of Experiments . New York: Wiley.

Dell, Melissa. 2010. “The Persistent Effects of Peru’s Mining Mita.” Ecometrica 78 (6):1863–1903.

Eggers, Andrew C., Olle Folke, Anthony Fowler, Jens Hainmueller, Andrew B. Hall, andJames M. Snyder. 2013. “On The Validity Of The Regression Discontinuity Design ForEstimating Electoral Effects: New Evidence From Over 40,000 Close Races.” Workingpaper.

Eggers, Andrew C. and Jens Hainmueller. 2009. “MPs for Sale? Returns to Office in PostwarBritish Politics.” American Political Science Review 103 (4): 513–533.

Fan, J. and I. Gijbels. 1996. Local Polynomial Modelling and Its Applications . Chapman &Hall.

Fotheringham, A. Stewart, Chris Brundson, and Martin Charlton. 1998. “GeographicallyWeigthed Regression: A Natural Evolution of the Expansion Method for Spatial DataAnalysis.” Environment and Planning 30: 1905–27.

Fotheringham, A. Stewart, Chris Brundson, and Martin Charlton. 2002. GeographicallyWeighted Regression: The Analysis of Spatially Varying Relationships . Hoboken, NJ:Wiley and Sons.

Fotheringham, A. Stewart and D. Wong. 1991. “The Modifiable Areal Unit Problem inMultivariate Statistical Analysis.” Environment and Planning 23 (4): 1025–1044.

Gerber, Alan S., Daniel P. Kessler, and Marc Meredith. 2011. “The Persuasive Effects ofDirect Mail: A Regression Discontinuity Based Approach.” Journal of Politics 73 (1):140–155.

Gill, Jeff. 2002. Bayesian Methods: A Social and Behavioral Sciences Approach. Boca Raton,FL: Chapman & Hall/CRC.

Goldstein, Kenneth, Michael Franz, and Travis Ridout. 2008. “Political Advertising in2008.”. Combined file [dataset], final release.

44

Green, Donald P., Terence Y. Leong, Holger Kern, Alan S. Gerber, and Christopher W.Larimer. 2009. “Testing the Accuracy of Regression Discontinuity Analysis Using Exper-imental Benchmarks.” Political Analysis 17 (4): 400–417.

Hahn, Jinyong, Petra Todd, and Wilbert van der Klaauw. 2001. “Identification and Esti-mation of Treatments Effects with a Regression-Discontinuity Design.” Econometrica 69(1): 201–209.

Hansen, Ben B. and Jake Bowers. 2008. “Covariate Balance in Simple, Stratified, andClustered Comparative Studies.” Statistical Science 23 (2): 219–236.

Hernan, Miguel A. and Tyler J. VanderWeele. 2011. “Compound Treatments and Trans-portability of Causal Inference.” Epidemiology 22 (3): 368–377.

Hopkins, Daniel J. and Elisabeth R. Gerber. 2009. “When Mayors Matter: Estimating theImpact of Mayoral Partisanship on City Policy.”. Unpublished Manuscript.

Huber, Gregory A. and Kevin Arceneaux. 2007. “Identifying the Persuasive Effects of Pres-idential Advertising.” American Journal of Political Science 51 (4): 957–977.

Imbens, Guido W. and Karthik Kalyanaraman. 2012. “Optimal Bandwidth Choice for theRegression Discontinuity Estimator.” Review of Economic Studies 79 (3): 933–959.

Imbens, Guido W. and Thomas Lemieux. 2008. “Regression Discontinuity Designs: A Guideto Practice.” Journal of Econometrics 142 (2): 615–635.

Imbens, Guido W. and Tristan Zajonc. 2011. “Regression Discontinuity Design with MultipleForcing Variables.”. Working Paper.

Keele, Luke, Rocıo Titiunik, and Jose Zubizarreta. 2014. “Enhancing Geographic Disconti-nuities Through Matching.” Journal of the Royal Statistical Society: Series A In press.

Keele, Luke J. and Rocıo Titiunik. 2013. “Natural Experiments Based on Geography.”Unpublished Manuscript.

Kern, Holger L. and Jens Hainmueller. 2008. “Opium for the Masses: How Foreign MediaCan Stabilize Authoritarian Regimes.” Political Analysis 17 (2): 377–399.

Krasno, Jonathan S. and Donald P. Green. 2008. “Do Televised Presidential Ads IncreaseVoter Turnout? Evidence from a Natural Experiment.” Journal of Politics 70 (1): 245–261.

Laitin, David D. 1986. Hegemony and Culture: Politics and Religious Change Among theYoruba. Chicago, IL: University of Chicago Press.

Lee, David S. 2008. “Randomized Experiments From Non-Random Selection in U.S. HouseElections.” Journal of Econometrics 142 (2): 675–697.

Lee, David S. and Thomas Lemieux. 2010. “Regression Discontiuity Designs in Economics.”Journal of Economic Literature 48 (2): 281–355.

45

Malpezzi, Stephen. 2002. “Hedonic Pricing Models and House Price Indexes: A SelectReview.” In Kenneth Gibb and Anthony O’SullivanD, editors, Housing Economics andPublic Policy: Essays in Honour of Duncan Maclennan Oxford: Blackwell Publishing.pages 67–89.

Matzkin, Rosa L. 2007. “Nonparametric identification.” Handbook of Econometrics 6: 5307–5368.

Middleton, Joel A and Donald P Green. 2008. “Do community-based voter mobilizationcampaigns work even in battleground states? Evaluating the effectiveness of MoveOn’s2004 Outreach Campaign.” Quarterly Journal of Political Science 3 (1): 63–82.

Miguel, Edward. 2004. “Tribe or Nation? Nation Building and Public Goods in KenyaVersus Tanzania.” World Politics 56 (3): 327–362.

Miles, William F.S. 1994. Hausaland Divided: Colonialism and Independence in Nigeria andNiger . Ithaca, NY: Cornell University Press.

Miles, William F.S. and David Rochefort. 1991. “Nationalism Versus Ethnic Identity inSub-Saharan Africa.” American Political Science Review 85 (2): 393–403.

Nall, Clayton. N.d. “The Road to Division: Interstate Highways and Geographic Polariza-tion.” Unpublished Manuscript.

Openshaw, S. 1984. The Modifiable Areal Unit Problem. Norwich, CT: Geo Books.

Papay, John P., John B. Willett, and Richard J. Murnane. 2011. “Extending the Regression-Discontinuity Approach to Multiple Assignment Variables.” Journal of Econometrics 161(2): 203–207.

Porter, Jack. 2003. “Estimation in the Regression Discontinuity Model.”. UnpublishedManuscript.

Posner, Daniel N. 2004. “The Political Salience of Cultural Difference: Why Chewas andTumbukas Are Allies in Zambia and Adversaries in Malawi.” The American PoliticalScience Review 98 (4): 529–545.

Reardon, Sean F. and John O’Sullivan. 2004. “Measures of Spatial Segregation.” SociologicalMethodology 34 (1): 121–162.

Rosenbaum, Paul R. 2002. Observational Studies . New York, NY: Springer 2nd edition.

Rubin, Donald B. 1986. “Which Ifs Have Causal Answers.” Journal of the American Sta-tistical Association 81 (396): 961–962.

Sheppard, Stephen. 1999. “Hedonic Analysis of Housing Markets.” In Paul Cheshire and Ed-win S. Mills, editors, Applied Urban Economics Elsevier volume 3 of Handbook of Regionaland Urban Economics chapter 41. pages 1595 – 1635.

46

Sinclair, Betsy, Margaret McConnell, and Donald P. Green. 2012. “Detecting Spillover inSocial Networks: Design and Analysis of Multilevel Experiments.” American Journal ofPolitical Science 56 (4): 1055–1069.

Tam Cho, Wendy K. and Erinn P. Nicely. 2008. “Geographic Proximity versus Institutions:Evaluating Borders as Real Political Boundaries.” American Politics Research 36 (6):803–823.

Trounstine, Jessica. 2011. “Evidence of a local incumbency advantage.” Legislative StudiesQuarterly 36 (2): 255–280.

VanderWeele, Tyler J. 2009. “Concerning the Consistency Assumption in Causal Inference.”Epidemiology 20 (6): 880–883.

Wong, D. 2008. “A Local Multidimensional Approach to Evaluate Changes in Segregation.”Urban Geography 29 (3): 455–472.

47

Appendices

A.1 InterferenceIn the estimation of treatment effects in randomized experiments or observational studies, wemust assume that there is no interference between units, that is, that the outcome of one unitis not affected by the treatment status of other units. Rubin (1986) called the assumptionof no interference the “stable unit-treatment value assumption” or SUTVA.17 In the classicRD, SUTVA violations via interference are typically of little concern. For example, there isno reason to suspect that the outcomes for a student just under a scholarship threshold willbe affected by the fact that a student just over the same threshold receives the treatmentin the form of the scholarship. For a GRD design, however, SUTVA violations might be aconcern. In our application, we would worry that a voter exposed to presidential ads mayurge a voter not exposed to ads to vote. In general, we doubt such SUTVA violations occur,since it is unlikely that contagion occurs outside of households. In our cases, householdsare close but not so close that we suspect easy spillovers since the area is suburban andthe density of housing is not high. Recent work designed to detect treatment spillovers inthe context of turnout finds little evidence for any contagion (Sinclair et al. 2012). That is,we doubt that citizens in urban areas actively discuss politics with their neighbors to theextent that they encourage increased participation. Moreover, a SUTVA violation of thiskind would tend to bias the effect towards zero, so any positive effects should be conservativeestimates. However, we need not ignore SUTVA violations of this type. One simple way toaccount for a possible SUTVA violation is to estimate treatment effects but exclude votersthat might be nearly adjacent.

A.2 Inference in Natural Experiments

One question that arises in the context of geographic natural experiments (and may arise inother types of natural experiments) is whether treatment assignment occurs at the individuallevel or at a more aggregated level of geography. With geographic natural experiments, onecould argue that the assignment mechanism operates not at the individual level but at anaggregated geographic level. For example, Card and Krueger (1994) estimated the effectof increasing the minimum wage on employment by comparing fast food restaurants inNew Jersey (where the minimum wage was increased) to restaurants in adjacent easternPennsylvania. One could argue that treatment occurs at the state level and as such this isan experiment with two observations. Card and Krueger, however, analyze their data at adisaggregated level. In our application, it could be argued that the assignment mechanismoccurs at the media market level or instead that voters choose to live one side of the mediamarket boundary by chance, which would lead to an individual-level assignment mechanism.If the former assignment mechanism holds, then we have two observations as well: thetreated media market and the control media market. The difficulty is that in most naturalexperiments, the analyst does not explicitly control the assignment mechanism which meansthe answer to the level of treatment assignment remains ambiguous.

17SUTVA also includes an assumption that there is only one version of the treatment or, if there aremultiple versions of the treatment, that the effects of these different treatments are the same.

48

Statistical inference in settings where either random sampling or random assignment oftreatment has not explicitly occurred must invariably rely on concepts such as superpopula-tions or hypothetical treatment assignment mechanisms. With natural experiments, wherethe object of inference is about treatment effects, one can assume the assignment mechanismis known and use randomization inference (Rosenbaum 2002). In our case, we assume thatindividual level assignment occurs around the media market boundary which serves as thediscontinuity. As such, the assignment mechanism is one that focuses on uncertainty aboutwhere one lives, and therefore it is an individual-level assignment mechanism. Some wouldargue that in these cases one should report estimates with no standard errors since we areworking with population data (Berk 2006). Others might argue that only inference basedon Bayesian principles is sound (Gill 2002). These are questions about the philosophy ofstatistical inference that cannot be resolved here or perhaps anywhere. Such questions anddebates are endemic to any situation where the analyst does not control either sampling orassignment through a random mechanism.

A.3 Proof of Proposition 1

We write the observed outcome as as Yi = Yi0 + Ti(Yi1 − Yi0) and generalize the proof inHahn et al. (2001). We let superscripts t and c denote locations that are in the treated andcontrol areas, respectively, sc ∈ Ac and st ∈ At. Assuming that Pr(Ti = 1) = 1 for all isuch that Si ∈ At and Pr(Ti = 0) = 1 for all i such that Si ∈ Ac Ti = 1 for all i ∈ At (thediscontinuity is sharp), that the density of Si is positive in a neighborhood around b, andAssumption 1, we have:

E{Yi|Si = st

}− E {Yi|Si = sc} =E

{Yi0|Si = st

}− E {Yi0|Si = sc}

+ E{Ti(Yi1 − Yi0)|Si = st

}− E {Ti(Yi1 − Yi0)|Si = sc}

= E{Yi0|Si = st

}− E {Yi0|Si = sc}

+ 1 · E{

(Yi1 − Yi0)|Si = st}− 0 · E {(Yi1 − Yi0)|Si = sc}

Taking limits,

limst→b

E{Yi|Si = st

}− lim

sc→bE {Yi|Si = sc} = lim

st→bE{Yi0|Si = st

}− lim

sc→bE {Yi0|Si = sc}

+ limst→b

E{Yi1 − Yi0|Si = st

}= E {Yi0|Si = b} − E {Yi0|Si = b}+ E {Yi1 − Yi0|Si = b}= E {Yi1 − Yi0|Si = b}≡ τ(b)

A.4 Alternative Statement of Identification Assump-

tionIt is useful to consider a special case of Assumption 1 which will be useful in applications. Fora point b on the boundary, we define a function fb(S) : R2 → R, that represents the distance

49

between any point S = (S1, S2) on the map and the boundary point b = (b1, b2). Particularforms for f(·) include Euclidean and chordal distances and are discussed in Section 6.2.

For every boundary point b, this allows us to collapse the underlying two-dimensionalscore of unit i into a single-dimensional score, fb(Si), which is the two-dimensional distancebetween unit i’s location on the map and the location of boundary point b. Note theestimand remains a plane since for every unit i, there will be as many values fb(Si) as pointsb on the boundary. As we discuss below, since the number of boundary points is infinite,in practice we only consider a grid of points along the boundary, so that fb(Si) needs to becalculated only for points on the grid.

Assumption 3 (Continuity in a scalar function of two-dimensional score). The conditionalregression functions are continuous in fb(b) at all points b on the boundary:

lims→b

E {Yi0|fb(Si) = fb(s)} = E {Yi0|fb(Si) = fb(b)}

lims→b

E {Yi1|fb(Si) = fb(s)} = E {Yi1|fb(Si) = fb(b)} .

Since fb(b) = 0, Assumption 3 can be recast as

lims→b

E {Yi0|fb(Si) = fb(s)} = E {Yi0|fb(Si) = 0}

lims→b

E {Yi1|fb(Si) = fb(s)} = E {Yi1|fb(Si) = 0} .

Under Assumption 3, unit i’s score, although a scalar, is a function of i’s two-dimensionallocation on the map, and the scalar score fb(Si) of unit i is defined relative to the boundarypoint b. Therefore, although this score is unidimensional for a fixed boundary point, it willvary as different points b along the boundary are considered.

Under Assumption 3, the treatment effect identified by the GRD design at the boundarypoint b is summarized in the following proposition.

Proposition 2 (Geographic treatment effect curve with scalar function of two-dimensionalscore.). Assuming that Pr(Ti = 1) = 1 for all i such that Si ∈ At and Pr(Ti = 0) = 1 for alli such that Si ∈ Ac Ti = 1 for all i ∈ At (the discontinuity is sharp), that the density of Siis positive in a neighborhood around zero, and Assumption 3, we have:

τ(b) = limst→b

E{Yi|fb(Si) = fb(st)

}− lim

sc→bE {Yi|fb(Si) = fb(sc)}

= E {Yi1 − Yi0|fb(Si) = 0} ,

where sc ∈ Ac and st ∈ At.

A.5 Discussion of Assumption ??We show what would happen under Assumption ?? if we used a boundary point bp tocalculate the limit of the regression function for the treated side and a different point bq forthe control side, the implicit comparison in the naive approach.

Taking limits at different points and invoking Assumption ?? and conditional indepen-

50

dence,

limst→bp

E {Yi|Si = st} − limsc→bq

E {Yi|Si = sc} = limst→bp

E {Yi0|Si = st} − limsc→bq

E {Yi0|Si = sc}

+ T t limst→bp

E {Yi1 − Yi0|Si = st} − T c limsc→bq

E {Yi1 − Yi0|Si = sc}

= E {Yi0|Si = bp} − E {Yi0|Si = bq}+ T tE {Yi1 − Yi0|Si = bp}− T cE {Yi1 − Yi0|Si = bq}= 0 + (1− 0)τ

= τ ,

where τ = E {Yi1 − Yi0|Si = bp} = E {Yi1 − Yi0|Si = bq}.

A.6 Connections to Geographic Weighted RegressionWe proposed a framework for estimation in the GRD design that uses local linear regres-sion to estimate the treatment effect at a given boundary point, using the two-dimensionaldistance of each data point to the boundary point as the score or running variable. Inthis framework, the treatment effect at each boundary point is estimated giving the high-est weight to data points that are closest to the boundary point. This method is closelyrelated to geographically weighted regression (GWR) (Fotheringham et al. 1998, 2002), atechnique commonly employed in geography and spatial econometrics. In GWR, an outcomeis modeled as a function of one ore more covariates, and the coefficients on these covariatesare allowed to vary at different geographic locations, giving higher weight in the estimationto observations near the location of estimation. GWR is typically used to gauge how therelationship between the dependent and independent variables changes through space. Thefocus of a GRD design, in contrast, is about estimating the change in the average outcomethat occurs exactly at a boundary point when a treatment is active on one side of the bound-ary but inactive on the other. Because the goal is to estimate the change in the conditionalexpectation of the outcome at the boundary point, GRD designs estimate the model sepa-rately on either side of the border. Also, unlike typical GWR analysis, GRD designs do notinclude covariates on the right hand side, because the identification assumption is typicallyabout continuity of the regression function conditional on the score alone – although thiscould be relaxed to continuity conditional on some set of pre-treatment covariates. Thus,typically, in GRD designs there is no interest in describing the relationship between outcomeand covariates generally through space; interest lies only on the boundary points, and therelationship of interest is usually between the outcome and the score alone.

Nonetheless, our treatment of borders and the discontinuity in treatment assignment theygenerate is related to the treatment of borders in geography. For example, Tam Cho andNicely (2008) study whether counties that are near and on opposite sides of state bordersshare political similarities. Analyzing all counties in the U.S., the authors find that thereis positive spatial autocorrelation in political tendencies (as measured by presidential vote)between counties on a state border and adjacent counties within the same state. But thespatial autocorrelation is no longer significant when the authors analyze the relationshipbetween border counties and their adjacent neighbors in a different state, suggesting thatpolitical tendencies change abruptly at state borders. Applying this type of analysis to GRD

51

designs would be fruitful and could provide researchers with one additional tool to gaugethe plausibility of the identification assumptions. In particular, an ideal GRD would be onewhere there is high and positive spatial autocorrelation in a neighborhood of the borderbetween units on both sides of the border, indicating that the two groups compared in theGRD design are indeed similar. Since the treatment would of course be a difference betweenthe adjacent areas, and that difference will itself affect the spatial auto-correlation measure,one would need to look at pre-treatment spatial autocorrelations, in the same spirit as onelooks at balance statistics on pre-treatment covariates.

A.7 Additional Empirical ResultsIn this section, we report additional empirical results for both the outcomes and importantpre-treatment covariates. Table A1 shows the result of the local generalized linear polynomialmodel with spatial weights, analogous to Table 3 in the paper, but with bandwidth chosenat each boundary point according to the MSE-optimal method proposed by Calonico et al.(2013c). Once again, the point estimate is simply the difference in turnout proportion acrosstreated and control areas at each boundary point. As can be seen, choosing optimal variablebandwidths at every point yields similar results to those obtained with the fixed 1.1 kilometerbandwidth: point estimates are statistically insignificant, with confidence intervals that reachto both sides of zero.

Table A1: Effect of presidential ads on 2008 Turnout - Local GLM,optimal variable bandwidth

Boundary point Bandwidth Point Estimate p-value 95% CI

1 5.32 0.043 0.46 [-0.07,0.16]2 2.99 0.004 0.96 [-0.15,0.16]3 4.63 0.070 0.34 [-0.08,0.22]

Note: Estimates are fit with local linear polynomial model with spatial weights.

Mean-squared-error optimal bandwidth (in kilometers) chosen separately at every

point. Point estimate is difference in turnout proportion across treated and con-

trol areas.

Table A2 shows local generalized linear polynomial results for pre-treatment covariatesavailable in the voter file. These covariates are age and female, black, Hispanic and demo-cratic registration indicators at the individual level.

52

Table A2: Placebo effect of presidential ads on pre-treatment covariates


1 Age [-7.93,58.369] 1.10 233 574 [-11.04,4.188] 2.30 2265 14082 Age [6.31,26.191] 1.10 401 342 [8.16,16.544] 2.86 3859 29953 Age [-12.43,16.222] 1.10 324 326 [0.21,13.048] 2.78 3338 39631 Black [-0.1,0.069] 1.10 233 574 [-0.13,0.007] 2.36 2351 15252 Black [-0.18,0.019] 1.10 401 342 [-0.07,0.043] 2.58 2760 25773 Black [-0.09,0.027] 1.10 324 326 [-0.05,0.003] 1.03 243 2461 Hisp [-0.17,0.073] 1.10 233 574 [-0.07,0.018] 3.32 4785 33152 Hisp [-0.33,-0.015] 1.10 401 342 [-0.09,0.014] 2.96 4197 32993 Hisp [-0.1,0.07] 1.10 324 326 [-0.05,0.049] 2.72 3188 38361 Dem [-0.38,1.324] 1.10 233 574 [0.03,0.179] 6.54 13885 94602 Dem [-0.37,0.253] 1.10 401 342 [-0.25,-0.004] 2.95 4135 32833 Dem [-0.49,0.116] 1.10 324 326 [-0.29,-0.105] 3.86 6484 70801 Female [-0.95,0.85] 1.10 233 574 [0.02,0.204] 5.53 11603 89372 Female [-0.41,0.305] 1.10 401 342 [-0.06,0.193] 4.18 7878 79583 Female [-0.44,0.47] 1.10 324 326 [-0.09,0.141] 6.03 14023 9460


distance to the point of estimation. Estimate indicates the point estimate (difference in pre-treatment covari-

ate across treated and control areas) using the fixed bandwidth h in the conventional inference column; NTr

and NCo indicate the effective sample size used for estimation in the treated and control areas, respectively; h

indicates the bandwidth used (in km). All results estimated with package rdrobust (Calonico, Cattaneo, and

Titiunik 2013b,a). In conventional inference, bandwidth is chosen manually. In robust inference, all bandwidths

are chosen with CCT MSE minimization method. MSE-optimal pilot bandwidths used in robust confidence

intervals are 3.16, 3.72, 3.11, 2.98, 3.03, 1.08, 3.90, 3.56, 2.97, 6.39, 3.86, 5.26, 5.96, 4.27 and 5.40, for every

boundary point and variable in the order reported in the table, respectively. Covariates are indicators equal to

one if individual is black, Hispanic, registered Democrat or female.

53

Geographic Boundaries as Regression...

Documents

Transcript of Geographic Boundaries as Regression...