DATA FILES, FOOTNOTES, AND EDITORS:Bridging … · Web viewDATA FILES, FOOTNOTES, AND...

61
DATA FILES, FOOTNOTES, AND EDITORS: Bridging Quantitative, Qualitative, and Editorial Transparency Practices Abstract: The three hundred year footnoting tradition has given rise to very rich and diverse transparency practices for scholars working with documents. The current DA-RT guidelines inadequately reflect those practices as they embody more narrowly the transparency practices employed in data files. This paper extrapolates the footnoting inspired transparency criteria. It shows how these criteria can be grafted onto the broad DA-RT guidelines and how their relevance is not confined to qualitative research. The paper illustrates the benefits of these transparency criteria by applying them to the literature on the origins of proportional representation. 1

Transcript of DATA FILES, FOOTNOTES, AND EDITORS:Bridging … · Web viewDATA FILES, FOOTNOTES, AND...

DATA FILES, FOOTNOTES, AND EDITORS:Bridging Quantitative, Qualitative, and Editorial Transparency Practices

Abstract: The three hundred year footnoting tradition has given rise to very rich and diverse transparency practices for scholars working with documents. The current DA-RT guidelines inadequately reflect those practices as they embody more narrowly the transparency practices employed in data files. This paper extrapolates the footnoting inspired transparency criteria. It shows how these criteria can be grafted onto the broad DA-RT guidelines and how their relevance is not confined to qualitative research. The paper illustrates the benefits of these transparency criteria by applying them to the literature on the origins of proportional representation.

“There are cracks in everything, this is how the light gets in.”

(Leonard Cohen, “Anthem”, 1992)

Responding to publish or perish pressures, scholars try to impress by presenting their research as edifices with cracks just large enough to signal methodological competence, but not too large to let the reader, and especially the reviewer, see all the duct tape holding the edifice together. This tension transparency and expediency exists across different fields of knowledge. Journalists engage in fact-checking feuds with politicians, scientists grapple how to address the “replication crisis”, and political scientists currently debate the Data Access and Research Transparency (DA-RT) initiative.[footnoteRef:1] Some political scientists see DA-RT as little more than costly, ineffective, and methodologically biased band-aids; they argue that it detracts from deeper problem such as the journals’ dislike of footnote, shortening word limits, none-transparent review process, and increasing publish or perish pressures. (Büthe and Jacobs 2015; Golden and Golden 2016) This realist perspective contrasts with the DA-RT advocates more idealistic stance. Those advocates contend that transparency improves the dialogue among scholars, publically legitimates the social sciences, and builds sounder epistemological foundations. (Lupia and Elman 2014) [1: Peregrine Schwartz-Shea and Dvora Yanow provide a very careful and insightful review of DA-RT’s genealogy. (2016)]

Unfortunately, this realist and idealist perspectives frequently talk past each other. Idealists call on individual scholars to improve their research practices to generate more reliable knowledge. They view publishing as a neutral, meritocratic process that ultimately rewards transparent research. Realists, in turn, link reliable knowledge to greater methodological pluralism, which DA-RT threatens. This paper builds a bridge between these seemingly conflicting perspectives. It hopes to convince realists that the DA-RT guidelines can accommodate methodological pluralism and persuade idealists that effective transparency also requires rethinking incentives of the current publishing practices. In short, the paper argues that improving transparency practices within the existing publishing practices is unlikely to succeed.[footnoteRef:2] [2: Sociological Science or PLoS (Public Library of Science) are two new journals experimenting with new approaches to peer review and research transparency. ]

This paper builds such a bridge in two ways. First, it addresses the realists’ concern by explicating the potential pluralism of the current DA-RT guidelines. It argues that those guidelines, while ecumenical in spirit, reflect too narrowly the transparency practices embodied in data files. (Isaac 2015, 271–72; Hall 2016; Trachtenberg 2015; Desch 2015) The paper extends those existing guidelines by drawing on the transparency practices embedded in footnotes that are pertinent most directly for document-based research. It focuses less on the transparency practices that pertain to human subject research. (Bleich and Pekkanen 2013)

Footnotes are helpful to reflect on the existing DA-RT guidelines for two reasons. First, the introduction of footnotes around 1650 provided the first to give backstage access to qualitative research. By the early 19th century, historians had refined their footnoting and established a rich inventory of transparency practices. (Grafton 1999, 205–210, 34–61) Second, historical footnotes address the pre-testing theorizing and test construction stages of research. Unlike the data-file template, they thus concentrate on more than just the final testing stage of social inquiry.

The second section evaluates the idealists’ claim that transparency produces tangible pay-offs. It does so by evaluating how closely the contributors to the literature on the origins of proportional representation (PR) in the early twentieth century followed the three, key elements of transparency. It scrutinizes how readily the evidence in each article could be located and accessed (i.e. data access); how much each contribution shares the challenges that it faced in collecting or generating the evidence (i.e. production transparency); and how explicitly it discusses the difficulties of drawing inferences from this evidence. (i.e. analytical transparency) (Elman and Kapiszewski 2014; Lupia and Elman 2014) It then explores further how compliance with those transparency standards reduced the very research errors that they seek pre-empt or, at least, whose detection they are meant to facilitate. The benefits of research transparency thus have to be evaluated in terms reducing research errors; they are too indirectly linked to causal inference to be evaluated in terms of the robustness of test results. Understanding the pay-offs of various DA-RT criteria should shift the debate over their merits from epistemological to more empirical grounds.

The PR literature is well suited for building a bridge between DA-RT camps. It encompasses nine contributions putting forth six different explanations. This lack of scholarly convergence can be read as a micro version of the larger replication crisis that is making regular headlines. It also provides a test case for applying qualitative-inspired transparency practices to quantitative research because eight of the nine contributions engage in quantitative hypothesis testing. The second section re-reads these nine contributions from a footnoting perspective to see how much confidence we can have in their respective test results in light of the data access, production transparency, and analytical transparency they provided. Ultimately, this backstage visit demonstrates that more comprehensively formulated and footnote-inspired transparency guidelines will make research more difficult (and hopefully better) for all scholars regardless of their methodological orientations. It shows that transparency provided by data files and computer codes ignores key elements of sound causal inference thus are insufficient.

Part 1: What Are the Transparency Standards Embedded in Footnoting

Research transparency rests on the assumption that published research findings are generated with the help of an scaffold that is largely dismantled prior to publication. (KKV 1994, 13) Scholars thus have a considerable degree of freedom in what they leave standing and thus make confidence in causal inferences not just conditional on the test results but also on the judgments linked to theorizing and test construction that precede the testing itself. In other words, the “researcher degree of freedom” (Simmons et al. 2011, 1) in theorizing and test construction affects the confidence in test results just as much the much better known “degrees of freedom” that the attributes of the data imposes on testing itself. DA-RT is premised on the assumption confidence in causal inference if transparency imposed on both the testing and pre-testing stages of social inquiry. It has published largely data-file inspired guidelines that cover both these stages of inquiry (APSA 2016) but also has invited qualitative research traditions to revise those guidelines to assure pluralistic transparency standards. This section follows up on this invitation by explicating the transparency practices embedded in the discursive footnotes use by scholars working with documentary evidence.

1.1. Data Access

Having access to data used during the testing and even pre-testing stages of social inquiry is a key for verifying and even replicating reported results. Qualitative scholars have widely read the DA-RT data access guidelines as a tacit critique that existing footnoting practices do not provide the same data access as data files. This critique might explain why this aspect of research transparency has received particularly strong pushback from qualitative scholars. (Büthe and Jacobs 2015; Hall 2016; Isaac 2015)

Table 1 summarizes how footnoting complements data files in assuring data access in the two ways specified by DA-RT. First, precise footnoting identifies the sources from which data was selected. Second, new forms of footnoting, referred to as active citation, hold the promise of making more textual evidence or research notes available, and thus creating the qualitative equivalent to data files.[footnoteRef:3] [3: DA-RT identifies unreported findings as a third element of data access. Unreported findings are meant to provide access to null findings or rational for the construction of particular tests. Making such background information more transparent, it meant to reduce publication bias or arbitrary test construction. (i.e. p-hacking, data mining) Appendix F discusses “Unreported findings” in more detail. ]

1.1.1. Data Location

The mechanics of properly citing from sources are important because they facilitate fact checking. Footnotes are the principle instrument for assuring access to qualitative, documentary evidence or the sources from which data sets were generated. Easily locating such material requires an actual footnote, containing an accurate citation, and including an actual page number. (Lustick 1996b; Steel 1996) Footnotes citing primary material require more details because archival material is too complexly organized to be readily located with a single page numbered footnote. (Trachtenberg 2015, 15–16)

1.1.2. Data Sharing

The traditional footnote is essential for locating evidence but it is limited in its ability to become a repository for actual data. Active citations try to expand the data sharing capacity of footnotes. Their goal is to make available extended sections of the archival material to both facilitate the fact checking by providing sufficient contextual information to allow the reader verify how a scholar interpreted the source material and what he/she selected. (Moravcsik 2010) The Center for Multi-Method and Qualitative Method Inquiry is experimenting with a qualitative data repository.

1.2. Production Transparency:

Production transparency retraces the steps taken to convert the facts located in a particular source into the evidence made available to the reader. It assesses the ease with which the production of quantitative or qualitative evidence can be retraced which ultimately is crucial for assessing their quality. The current DA-RT guidelines closely echo the quantitative, data-file template of production transparency and are less specific about qualitative production transparency. DA-RT only recommends that scholars discuss how their documentary sources or interviews were selected and sampled, or how ethnographic works was conducted. (APSA 2016, 18–19) These guidelines consequently leave out many of the production transparency practices that qualitative scholars discuss in their discursive footnotes.

An important starting point for retrieving the production transparency practices embedded in footnoting is the semantic distinction that emerged in 17th century between facts that eventually are converted into evidence. Elaine Daston points out that “facts are robust in their existence and opaque in their interpretation. Only when enlisted in the service of a claim or a conjecture do they become evidence, or facts with significance.” (1991, 94; Beach and Pedersen 2013, 120-33; Evans 1997, 75-103) This semantic distinction underscores that qualitative evidence involves a production process just as complex and self-reflective as the conversion of observations into quantitative data files. The rules of evidence in law and source criticism in history guide the conversion of facts into evidence and thus are the functional equivalent to DA-RT’s production transparency.

Qualitative scholars emphasize that footnotes dealing with documentary elements focus on three elements that relate to production transparency. First, they assess the quality of sources to make transparent the credibility of the evidence used. Second, they discuss the recording of evidence to make more transparent potential factual inaccuracies. Third, they review the range of available sources to make more transparent the diversity of the selected evidence. Table 2 summarizes these three elements.

1.2.1. Quality of Sources:

Historians engage in “source criticism” to give readers a sense about the quality, completeness, and potential biases of their primary sources. (Howell and Prevenier 2001, 60–68) They indirectly provide a template for assessing secondary sources. Their quality can be assessed in terms of the reputation of the press or journal publishing the material. It also can be evaluated in terms of its subsequent critical reception or successful replications. Evan Lieberman proposes evaluating secondary sources by analyzing the proximity of evidence to the actual process it represents. He believes evidentiary proximity is important because qualitative evidence is “extremely vulnerable to conflicting interpretations, and for larger-scale units, over long periods of time, substantial concerns about what to observe are likely to arise.” (2010, 40–41) He points out that some scholars directly observe and record an event and thus produce direct evidence (i.e. participant observation, film footage). The same goes for scholars who use documents written by individuals with first hand knowledge of the events. (i.e. letters, journalistic accounts, transcripts, diaries, governmental archives) Scholars combining such primary evidence with other published materials produce secondary less proximate evidence, which turns into tertiary, even less proximate evidence, when it is compiled in the form of “encyclopedias, fact books, datasets and databases.” (2010, 41)

1.2.2. Recording & Coding:

Daston’s definition of evidence as “facts with a purpose” underscores that facts are purposively selected, combined, and interpreted before they acquire a “purpose”. Or, the creation of evidence involves selecting a measurement instrument to converts nominal observations into interval data. Production transparency dictates that scholars should be as accurate as possible about the recording, re-arranging, or coding of facts and readily footnote any ambiguities they encountered. The recording process should be free of factual errors and readily withstand fact checking. (Jackson and Jamieson 2007)

1.2.3. Range of Sources

Thomas Kuhn famously observed that evidence is “theory laden” because theory determines what information becomes evidence and what remains mere facts. Such theory-ladeness invariably leads to the oversight of counter-evidence predicted by alternative explanations as well as silent evidence unpredicted by any existing theory. The goal of production transparency is to draw attention to such overlooked evidence by reflecting on the diversity of sources. (B&C 2014, 26; Hall 2003, 394) Qualitative scholars highlight three particular types of diversity: methodological, temporal, and theoretical. Methodological diversity matters because qualitative evidence is less standardized and subject to fewer ontological pre-assumptions. Qualitative data thus is more diverse and provides a thicker, more complex representation of social reality than quantitative date. (Becker 2007, 55–70; Hall 2003) Temporal diversity is important because it helps make transparent to unstated boundary conditions. Or looking at sources written from when events occurred permits reading history forward and pre-empts the “creeping determinism” or functionalism that comes from reading history backward. (Fischer 1970, 130–50; Fischhoff 1982, 341) Alan Jacobs also shows how attention to temporal dynamics plays in important role in untangling endogeneity and multi-colliniearity problems. (2014, 50–56; See also Pierson 2003) Third, theoretical diversity matters because it highlights the different explanatory factors privileged by different theoretical frames. (Abbott 2004; Lustick 1996a)

1.3. Analytical Transparency

DA-RT broadly defines analytical transparency as “clearly mapping the path from data to the claims.” It advises quantitative scholars to make available their scripts or .do files and qualitative scholars to “describe relevant aspects of the overall research process, detail the micro-connections between their data and claims (i.e., show how the specific evidence they cite supports those claims), and discuss how evidence was aggregated to support claims.“ (APSA 2016, 19) The broader goal of analytical transparency is to prevent data mining or, its qualitative cousin, historical tourism. Both are to valid causal inference are what hagiographies are to artful biographies; they embellish analysis by ignoring disconfirming evidence, engaging in ex post theorizing, overlooking alternative explanations, and backgrounding ontological complexities. (Yom 2015, 636–39) Analytical transparency aims to make such disguising moves easier to detect.

DA-RT’s qualitative analytical transparency guidelines are very short and don’t extend beyond the three lines cited in the previous paragraph. This brevity is surprising and hints at the committee’s difficulties to articulate for qualitative scholars analytical transparency equivalencies to the .do files used by quantitative scholars. It also can be interpreted as reflecting a narrow, very test-centric view of the causal inferences problems and the related but mistaken belief that .do files make transparent all those problems. (Yom 2015, 636)

DA-RT overlooks five analytical transparency considerations that are of particular but not exclusive concern to qualitative scholars. Table 3 summarizes those considerations. They include attention to the prior foreknowledge against which new test results are used to update our knowledge, the number of predictions a theory makes, the precision of those predictions, the degree to which a test engages alternative explanations, and how attentive a test is to the temporal structure of the evidence. Qualitative scholars would contend that footnotes concern themselves with those five elements of analytical transparency because .do files address in adequately.

1.3.1. Priors:

Scholars who publish less generally read more for each page they publish, have a greater sensibility to the evolving nature of knowledge, and recognize that their findings benefit from a detailed, richly footnoted, and hence transparent dialogue with the available foreknowledge. Explicitness about available and consulted prior knowledge becomes a crucial element of analytical transparency that helps the reader place a specific finding in the broader context of knowledge production. It helps to identify whether a particular scholar adheres to a narrow test-result focused, publish or perish mode of scholarship or whether she abides by a more expansive, exploratory, and answer-seeking tradition. Existing scholarly practices acknowledge the importance of making transparent the dialogue between new research and the available foreknowledge. Bayesian analysis requires researchers to express the prior probabilities that hypothesis is correct; standard methodology texts talk about the external validity of test results; and historians of sciences talks about research cycles. (Geddes Barbara 2003, 1–27; Lieberman 2016; Trachtenberg 2006, 55–67) A few, rudimentary bibliometric indicators capture the transparency of such priors. (Fanelli and Glänzel 2013; Simkin and Roychowdhury 2006) These indicators involve countering how many of the available prior works an author cites, how many of those cited works the scholar substantively engage through a more detailed discussion, and how many prior works validate a new theoretical claim.

1.3.2. Specificity of Theory

Not all theories are created equal and analytical transparency should clarify three differences that are observable during the pre-testing, theorizing stage of analysis. First theories vary in the number of empirical implications that they make. The more predictions a theory makes, the more specific it becomes. (B&C 2014, 30; Evera 1997, 30–34) KKV, for example, define a good theory as being “capable of generating as many observable implications as possible. [Such theorizing] will allow more tests of the theory with more data and greater variety of data and will put the theory at risk of being falsified more times.” (KKV 1994, 19; Gerring 2012, 74–100; Hall 2003, 392–95) Second, how many predictions does one theory make relative to its competitor. Confirming a theory making a single prediction, while disconfirming an alternative theory making five predictions inspires far less confidence than the reverse test configuration. Third, how many of the theorized implications are tested against actual empirical evidence. Statistical models frequently test only a subset of implications made by the broader theory. (Walt and Mearsheimer 2013, 338–45)

1.3.3. Units of Analysis

Besides the number of their predictions, theories also vary in the granularity of each individual prediction. This granularity is related to the level of analysis at which a theory makes a prediction. Social scientists talk about the levels of analysis to differentiate the granularity or concreteness of evidence (Gerring 2007, 235; KKV 1994, 199; Gaddis 2002, 25; Kittel 2006) They frequently face the dilemma of wanting to choose the unit of analysis that is methodologically most appropriate but having to settle for the a less granular unit at which data is available. Analytical transparency requires scholars to talk about such dilemmas and reflect on how their given choice might affect the robustness of their results. Ideally, the unit of analysis should be pegged at the same level at which the theory predicts a piece of evidence. (Gerring 2012, 90–91; Lieberson 1985, 107–15) For example, it is not valid to judge the quality of a particular school in terms of the quality of its school district. Such an interpretation would commit the fallacy of division, which assumes that something that what is true for a larger unit (e.g. the district) also is true for its parts or smaller unit (e.g. the school). Vice versa, it would be invalid to draw inferences about the quality of a school district by using evidence about the quality of a particular school. (Wheelan 2013, 51–54) Such inference would commit the fallacy of composition. To risk of either such fallacy is lowered to the extent that evidence for a particular prediction is observed at the same unit of analysis at which it is predicted.

1.3.4. Test Construction

The goodness of fit between evidence and a given hypothesis is only one element of effective testing. The inferential leverage of this fit significantly increases to the extent that a test also engages alternative explanations and puts them to full head-to-head, fully symmetrical contest with the test hypothesis. Peter Hall writes that “progress in social science is ultimately ... based on a three-corned comparison among a theory, its principle rivals, and a set of observations.” (2003, 392) Test construction analyzes the configuration of such three-corned comparisons. It has recently received attention among psychologists who talk about a “researchers degree of freedom” in constructing tests, (Simmons et al. 2011, 1350) or Bayesian process tracers who talks about the strength of a particular test configuration. (B&C 2014, 23–24; Evera 1997, 30–31; Hall 2003, 392; Stinchcombe 1987, 18–22) Both group of scholars point out that analytical transparency requires scholar to fully disclose all the alternative explanations they tested, how many control variables they selected, and according to what criteria. Strong tests make not just precise and plentiful predictions of what evidence should be found, but also about what evidence should not be found. Strong tests leverage the empirical implications of their competitors to make predictions about null findings. In doing so, they increase the internal validity of a theory by controlling for and thereby eliminating the fullest range of possible alternative explanations. (Hall 2003, 394; Rueschemeyer 2003, 317–18; Yom 2015, 635) The goodness of fit across an ever increasing sample of evidence increases external validity of a theory, and the decrease of counter-evidence (i.e. predicted null findings) increases its internal validity.

1.3.5. Temporal Transparency

Social science inquiry rests on ontological assumptions about time that often are insufficiently acknowledged even though they can bias causal inferences and affect data quality. Elaborating on this issue, Peter Hall advises scholars to align their methodologies with their ontologies. He particularly points out that individual pieces of evidence oftentimes are interdependent or change across time and thus fail to meet the conditions of unit homogeneity and conditional independence on which regression analysis rests. (2003, 382) Paul Pierson, in turn, points out that the temporal structures through which causes are related to their effects vary significantly. As Table 4 shows, he differentiates between the time horizons of independent and dependent variables to generate a four-fold temporal structure. He contends that most analysts treat all causal relationships as if they were short/short. (2003)

Finally, Stanley Lieberson points out that causal relationships can be reversible but most theories are asymmetrical because they only theorize first-time causation and not potential subsequent reversals. (1985, 63–87) Analytical transparency should address the temporal assumption a particular test makes about the temporal attributes of the data, causal structure, and causal symmetry.

In sum, this first section shows that the transparency practices associated with footnoting fit easily in the three transparency categories defined by DA-RT. It underscores the potential to expand DA-RT and its potential to be pluralistic. Most likely, realist scholars will remain unimpressed and question the pay-off of even expanded transparency guidelines. Conversely, idealist scholars, who favor transparency guidelines more in line with the data file practices, will be unconvinced about many of the new, especially analytical transparency criteria. The next section addresses this dual skepticism by evaluating the transparency practices of the literature on the origins of PR.

2. How Much of a Difference Does Research Transparency Make?

The DA-RT guidelines have a gentle, diplomatic tenor as its drafters hoped to highlight their benefits and assumed that those benefits would be largely self-explanatory. The guidelines also assume that the benefits accrue uniformly across the three transparency categories. The pushback against DA-RT underscores that those assumptions are not uniformly shared. This section therefore assess the benefits of research transparency and evaluates which of DA-RT three elements produce the biggest ones. It does so by scrutinizing the transparency practices of the PR literature.

This scrutiny involves a two-step process. I first explore how closely the contributions to the PR literature follow, either explicitly or implicitly, the transparency practices discussed in the previous section. In a next step, I inquire to what extent non-compliance with those practices produces, what I will call, research errors. I use research error as an umbrella term to capture the absence of benefits associated with particular transparency practices. Research errors include the inability to locate evidence, verify factual claims, replicate codings, identify unit of analysis, understand test constructions, or confirm ontological assumptions. They can be thought of as judgment errors that scholars make during the pre-testing stage of analysis that are distinct from the technical, testing errors occurring during the final, causal inference stage. The focus on compliance with transparency practices and potential research errors serves demonstrates transparency practices embedded in footnotes have distinct benefits that are only inadequately captured by data file centric, current DA-RT guidelines. It further shows production transparency and particularly analytical transparency have the greatest pay-offs and confirms the realists’ claim that for qualitative scholars data access is little more than epistemological red tape. This section begins with a short synopsis of the PR literature. It then looks the transparency practices of the nine contributions as well as their research errors.

2.1. PR Literature

At first sight, the nine works on the origins of PR give the impression of far reaching differences, as most seek to reject Carles Boix and Stein Rokkan’s original left threat thesis and supplant it with their own explanation. The left threat thesis contends that incumbent conservative parties sought new ways to contain both the electoral and policy threat posed by socialist parties and their quick electoral gains that followed the expansion of the franchise. This threat was most pronounced in instances where the left had a sizeable electoral strength and the right was fragmented, thus risking splitting their votes and making socialist parties more likely. The adoption of PR became attractive in those circumstances because it eliminated the need of different conservative parties to coordinate their electoral candidates to eliminate the risk of vote splitting that existing under various existing non-PR systems. (Boix 1999; Rokkan 1968) On closer inspection, though, these differences are not quite as stark as all seven explanations share two elements. First, they all contend that franchise expansion led actors to consider adopting new electoral systems. Second, they agree that these considerations were guided by basic cost-benefit calculations. The explanations differ mostly with respect to three elements: the circumstantial factors constraining actors’ choices, the experiences shaping their preferences, and where in time causal factors are located.

Alberto Alesina & Edward Glaeser (hereafter A&G), Josephine Andrews and Robert Jackman (hereafter A&J) and Ernesto Calvo all define parties as instrumental, short-term strategic, office seekers who choose electoral systems in terms of their efficiency of translating their and only their vote share into seats. (2004; 2005; 2009) The three accounts differ in how constrained they consider actors’ choices to be and how removed in time these constraints are.

A&G provide the most proximate explanation in which actors are the least constrained. They argue that PR was conquered by the left and with the help of extra-parliamentary, revolutionary uprisings that pressured bourgeois parties to adopt PR. They also contend that in larger countries these uprisings were facilitated by the military defeats in World War. I which weakened bourgeois incumbents. For the left, PR helped maximize their seat share and offered an electoral safeguard against the right. The franchise had the more indirect effect of electorally strengthening the left. (2004)

A&J offer an almost equally political explanation except that they focus on the political instability caused by the franchise expansion. They argue that the sudden entry of large numbers of new voters created so much uncertainty about parties’ winning chances that politicians were unable to choose the electoral system most likely to maximize their parliamentary seats. Under such uncertainty, parties chose PR because it promised to minimize potential losses. A&J’s argument starts with the franchise, which they treat as an exogenous factor, and as such requires no search for further antecedent causes.

Calvo’s actors are constrained by the geographic distribution of their votes and the efficiency with which different electoral formulae translate them into seats.[footnoteRef:4] He argues that parties will choose the electoral system that most efficiently translates the geographic distribution of their votes into seats. PR was adopted as a joint project of parties whose electoral support was territorially dispersed across a large number of districts. For such parties, PR improved the geographic efficiency of their votes compared to single member plurality systems (SMP) or double ballot systems. His explanation thus starts with the electoral geography just prior to the franchise expansion. [4: Jonathan Rodden makes a similar argument in an unpublished manuscript. I borrow the term “geographic efficiency of votes” from him. ( n/d)]

Boix (2010) and Amel Ahmed (2013) agree with Calvo’s electoral geography argument but add elements that are largely contemporaneous with electoral geography. Boix argues that the geographic efficiency of votes depends not just on each party’s individual geographic vote distribution but also on those of their non-socialist coalition partners as well as the left’s strength. When incumbent bourgeois parties faced a sufficiently serious left threat, they granted PR to the left to safeguard their own parliamentary seats. In considering these larger patterns of electoral competition, parties are no longer just office seekers but also policy seekers who are just as concerned about the policy implications of electoral systems as they are about their effects on maximizing their party’s seat share. (Benoit 2004) Ahmed agrees with much of Boix’s argument and additionally points out that in many countries SMP was not the status quo electoral system but constituted, together with PR, a new electoral safeguard against the left. Her two case studies illustrate this point. Given its explorative nature, her article does not generalize under what conditions one electoral system is chosen over the other, nor does she provide a full overview of the full range of different originating electoral systems (i.e. those predating either PR or SMP). (See also Colomer 2007)

André Blais et al. (Hereafter BDI 2005) and Thomas Cusack et al. (Hereafter CIS. 2007, 2010) offer the two most constrained accounts in which parties are motivated by broader, less partisan goals. BDI contend that parties’ choices were constrained by widely held democratic norms about equality and fairness. They argue that PR’s ability to translate more equitably votes into seats was part of the same democratic norms and cross-national intellectual mobilization that pushed countries towards expanding the franchise and parliamentary sovereignty. The starting point for their argument thus predates that of Calvo, Boix and Ahmed. They also explain the non-adoption of PR by the effects of FPTP systems. Such systems, as opposed to double ballot systems, limited the number of parties under the regime censitaire to two. FPTP systems thus created partisan incentives to keep the status quo system that were too strong for the democratic norms of PR to overcome.

Finally, CIS offer the most complex and structural explanation, attributing the adoption of PR to the co-evolution of capitalism and forms of representation. They argue that countries with proto-corporatist legacies had labor markets that depended on the close cooperation between labor and capital. They further contend that the combined effect of demographic changes resulting from industrialization and changes in the make up of the electorate caused by the franchise expansion undermined cross-class co-operation in countries with proto-corporatist legacies. In response to these changes, PR became a joint project of capital and labor to re-establish the earlier, close labor market induced class cooperation and to make parliamentary decision-making more consensual. CIS thus locate their primary causes the farthest back in time and also view actors as highly constrained. (2007)

2.2. Data Access

Data access was thorough across the nine contributions to the PR literature. Data location was easy as quantitative scholars carefully cited the sources of their data sets and did an adequate job footnoting their historical evidence. Qualitative scholars did an even better job by regularly adding page numbers. None of the works publically shared their data but this is explicable by the fact that such norms were not well established at the time of their publication. This section also concludes that the DA-RT’s proposal for qualitative data sharing and active citations has limited pay-offs. Overall then, footnoting provides enough data access to make the controversial, data-file inspired innovations recommended by DA-RT less necessary. (Büthe and Jacobs 2015; Hall 2016; Isaac 2015) Let me elaborate on those conclusions.

2.2.1. Data Location

Footnoting assists locating the sources of the data sets, used by six quantitative works, as well as the historical evidence used by all nine works for their process tracing. I broadly define historical evidence as coming from works relying primary or secondary historical material. All the other works were either theoretical or methodological and did not contain historical evidence. Overall, I identified 120 historical works cited in the nine contributions that accounted for 41.7% of all cited works. Those historical works, in turn, generated 150 historical citations that account for 46.7% of all citations. To compare data, I looked at how many historical citations per 1000 published words each work contains. The higher the ratio the more opportunities the reader has to access the original sources from which the data was generated. However, a citation only provide genuine access if it also contains a page number. So, I calculate the share of historical citations containing actual footnotes.

Table 5 shows variation in the ease with which evidence can be located for the nine contributions. The six quantitative pieces all indicate the sources from which they generated their data sets. With respect to locating historical evidence, A&G and Ahmed constitute the end points of an access continuum. A&G’s 0.8 historical citations and 36% page number share make it very costly if not downright impossible to locate their evidence. They offer a series of one to two paragraph country studies that frequently lack any footnotes. The 0.8 historical citations are particularly astonishing because A&G advance a strictly historical argument and cite no methodological or theoretical work. This lack of access makes it impossible to verify their factual claims when they deviate from those of other sources. By contrast, Ahmed 7.1 historical citations and 71% page numbering make locating evidence far less costly. The other seven works fall somewhere between these two extremes. With the exception of Boix (1999) and Calvo, they are conscientious by giving readers a precise page number. The absence of page numbers makes it time consuming to locate the relevance evidence. The variations in historical citations are a bit more difficult to interpret because they also reflect how many words authors devoted to process tracing as opposed to discussing theory or methodology. It also is interesting how Boix (2010) and CIS (2010) second versions of their arguments were accompanied by a significant increase in data access.

2.2.2. Data Sharing

The issue of whether quantitative and qualitative data can be shared with the same ease has provided some of the most heated discussion. Table 6 makes clear that data sharing is minimal in the PR literature and reflects the weakness of such norms until very recently. With the exception of Boix, none of the journals or scholars’ personal webpages make data sets publically available. CIS’ (2010) web appendix responds to the coding critiques of an earlier article. Also, none of the articles use active citations, which again is not surprising given that all the articles pre-date this data access innovation.

At first sight, the data sharing is attractive because it dramatically reduces the replication costs and provides a public good. On second thought, those benefits are far more direct for quantitative research that produces readily shareable data files. It is less clear how qualitative data repositories or active citations could provide comparable benefits. Scholars who experimented with active citations report mixed experiences. They report that the increased explicitness of making available their primary material also made them more careful in handling their source material. However, most reported that these benefits were not enough to offset the time costs required for active citation. (Fairfield 2016; Snyder 2014; Trachtenberg 2015, 15) Arguably, active citation is problematic because it rests on an overly atomistic view of how qualitative evidence and assumes a discreet textual location for each piece used to support causal inferences. This view is incongruent with the experience of qualitative research. Historians, for example, talk about concatenation to describe the process of linking different facts, dispersed across different locations in a document, or even across multiple documents, into a single, coherent, and audible piece of evidence. (Gaddis 2002, 20–30) It is unclear how active citation could readily accommodate the locational complexities of the factual constituent parts of an actual piece of evidence. Furthermore, active citation does not offer a ready solution for the problem of selective citation. The validity through which the concatenation process converts textually disperse facts into broader evidence is conditional on absence of the counter-evidence. It is unclear how active citation helps demonstrate the absence of such counter-evidence since it is premised on providing access to supporting absence. For example, not all conservatives supported PR as Boix predicts. Active citation works well for texts that document conservatives supporting PR. It is less clear how it would make transparent counter-evidence that a scholar choses to ignore. Appendix B provides an illustration about the textual dispersion of evidence and the difficult to demonstrate absence of evidence with an example drawn from CIS (2010).

2.3. Production Transparency

The elements of production transparency varied in the PR literature. The quality of sources varies the least and is consistent across the nine contributions. The recording and coding transparency varies more and resulted in significant research errors. These errors are consequential because they involve the evidentiary basis used to generate hypotheses or test them. This section shows that giving footnotes the same standing as .do files increases the significance of production transparency.

2.3.1. Quality of Sources

The nine contributions neither discuss their sources and nor differentiate them in terms of their quality. This does not turn out to be a problem because all the evidence is cited from from highly ranked journals or university presses. Table 8 uses the ISI journal impact factor as well as the proximity of the sources to assess their quality. It shows that the quality of the sources is almost identically high across the nine contributions. Four contributions are published in the top ranked American Political Science Review, while the others appeared in other top ten journals or with Oxford University Press. There is even less variation in the impact factor of the journal article that they cited. A&G are the one exception in that they do not cited any journal articles. Finally, all nine articles rely heavily on secondary sources and tertiary electoral handbooks and so there is no difference in the proximity of their cited sources. Ahmed is the only one using primary sources. The fact that the inattention to quality of sources does not matter in the PR literature might be attributable to the fact that the peer review process successfully vetted the sources. It probably also is an artifact that the nine contributions rely almost exclusively on secondary rather than archival sources.

2.3.2. Recording & Coding

Production transparency expects scholars to translate facts into evidence in an accurate fashion or use valid measures to quantify observations. Recording and coding frequently are complex and involve interpretive judgments. Production transparency stipulates that scholars document any such ambiguities to assure that subsequent fact checking does not turn up factual or coding errors. It is important to underscore that such fact checking is not meant to be a gotcha undertaking. Factual inaccuracies, coding errors are bound to occur, especially when intrepid political scientists engage in large-scale, comparative historical analysis. Factual and coding, however, become a problem when they are frequent, use invalid measurement instruments or pertain to central theoretical claims.

The nine articles vary in the transparency of their recording and coding process. The six quantitative contributions differ in the explicitness with which they select their measurement instruments. None discusses any ambiguities in their coding or their recording. This suggests that their data production was unproblematic and should readily withstand any fact checking.

Table 9 summarizes the results of these fact-checking efforts that are elaborated in Web Appendix C. BDI, A&J and CIS advance brand new theoretical claims and develop new measures to collect data. They devote little time demonstrating the validity of those measures. Boix (1999, 2010) and Calvo (2009) employ well-established measures that don’t raise any validity concerns. Except for CIS, all articles use electoral statistics and thus don't face any coding problems. CIS labor market measure rests on interpretations of historical evidence that could not be consistently verified. (See Web Appendix G) Finally, fact checking the nine articles shows variations in their factual accuracy. I was unable to find any significant factual errors in Ahmed, Calvo, Boix (1999, 2010), and A&J. However, there were important factual errors in BDI, A&G, CIS, and Calvo that are discussed further in Web Appendix C

2.3.3. Range of Sources

Besides their quality, sources also vary in other characteristics. They reflect different political perspectives, are grounded in distinct disciplines, employ different methodologies, are written at different points in time, or subscribe to distinct theoretical approaches. Each of these characteristics influences what sort of evidence a particular sources is likely to contain and thereby influences how diverse the evidence will be. And this diversity influences the breadth evidence against which a theory is tested and the likelihood of encountering counter-evidence.

The nine contributions pay no attention to the range of available sources, which ones they selected, and how their characteristics might differ. They consequently fail to acknowledge three important differences in the characteristics of their sources. These differences are reported in Table 10. First, I counted how many historical works each contribution cites. The larger the number of historical works, the more likely the evidence will reflect different political or even theoretical perspectives for the reasons discussed in first section. Second, I calculated the average publication age of the historical references to see whether a scholar focused mostly on recent work or whether they also consulted older or even primary sources. Third, I refined age measure, by also counting works written between 1890 and 1920 that were contemporaneous with the events analyzed.

The nine contributions fall in two groups. Boix (2010) and Ahmed have the most diverse data production; they cite a significantly larger number historical works, that span a longer period, and that were contemporaneous to the events. The citations of the other seven works draw on a less diverse evidentiary basis. It is difficult to link the diversity of sources directly to research errors. However, there is an interesting connection between A&G and CIS’ very low publication age and absence of contemporary sources and the functionalist nature of their arguments. Contemporaneous sources permit reading history forward and avoid hindsight bias or functionalism that comes with reading history backward. A&G and CIS’ arguments are functional because the factors explaining the origins of PR also explain indirectly the policy effects of PR, that they elaborate elsewhere. (2004, 1–75; 2006) For A&G, a strong left not only pushed for PR but also for the redistributive policies enabled by PR. For CIS, capital labor cooperation produced PR which also facilitated social policies encouraging investing specialized skills desired by both industrialists and workers.

2.4. Analytical Transparency

Analytical transparency is low across the nine contributions when compared to production transparency, and especially data access. This reflects the limited attention the seven quantitative works pay to the judgments-based, pre-testing stage of analysis. Most importantly, this lack of analytical transparency translated in into significant research errors, particularly for the units of analysis, test construction and temporal transparency.

2.4.1. Priors

The external validity of a test result is conditional on how much prior corroborating findings are available. The PR literature does very little to assess its new findings against earlier ones; it prefers instead what Jörg Friedrichs and Fritz Kratochwil call the “gladiator style of analysis, where one perspective goes forth and slays all the others.” (Cited in B&C 2014, 31) Boix (1999) serves as the focal explanation that everyone else tries to “slay” and in the process ignoring all others.

Table 11 illustrates how the nine works varied in their engagement with prior scholarship by looking at the share of the prior relevant work a scholar cites, what share of those citations they directly engage, and how many of the prior works reports similar findings.

The PR literature grew steadily with Boix (1999) having five prior studies available while most recently Ahmed having thirteen to cite. The share of those prior works actually cited varied. A&G do not cite a single work available at the time of their writing. A&J, CIS (2010), Boix (2010) and Calvo cite less than half the available works. The other authors cite more than 50% with Ahmed the most well read with citing 77%.

I further looked whether scholars engage the cited works by theoretically critiquing them, identifying cases they fail to explain, or selecting from them control variables. The low share with which scholars engage with prior works demonstrates a clear preference for testing over theorizing. It departs from B&C’s recommendation to be equally tough on all alternatives and to refuse giving an “unduly privileged status to one explanation by granting it a first mover advantage.” (B&C 2014, 24) BDI, A&G and CIS advance entirely new theories and thus no prior works to engage. It therefore would have been appropriate for them to point this out and underscore the speculative nature of their findings. By contrast, A&J engage the findings from post-communist PR adoptions, and Calvo draws on the electoral geography literature. The consequently can draw on prior support for some of their claims. Finally, Boix and Ahmed refine the left threat thesis first formulated by Braunias in 1932, thus increasing the number of prior findings. Overall, these variations in the engagement with the available foreknowledge suggests important differences in each theory’s external validity.

2.4.2. Specificity of Theories

Theories are not all equal because they vary in the frequency of their predictions and the share of those predictions that are empirically tested. The nine contributions pay very different attention to these two issues. As Table 12 shows, the number of predictions gradually increased over the course of the PR literature, underscoring a modest theory development. This increase was most evident in Boix (2010) and Ahmed (2011) which both expanded on Boix. (1999) Table 12 also lists the share of the theorized predictions that individual works actually tests. Here Boix (1999), BDI and Calvo stand out because they make and test fewer predictions. But in exchange, they test them over a larger number of cases. A&J, CIS (2010), and Ahmed’s case study approach allows them to test a higher share of their predictions but over a smaller number of cases. A&G also rely on strictly qualitative analysis but test a much smaller share of their predictions. Overall, these variations in the specificity of theories are modest and their effect on the validity of causal inferences seems more tentative than for the priors.

2.4.2. Units of Analysis

The PR literature devotes no attention to their own or their competitors units of analysis even its specification profoundly affects the validity of causal inferences. The choice of electoral systems can be studied at four possible levels: individual deputies (or factions), individual national parties, groups of parties (e.g. the right), or the national party systems. None of the contributions choose individual deputies as their unit of analysis because, at the time of their writing, it was difficult to obtain the relevant, district level data. This unit would guarantee the most valid causal inferences. In the absence of district level data, Boix (1999/2010), CIS (2010), Calvo and Ahmed analyze national parties. The inferences drawn from this unit of analysis are valid to the extent that parties are disciplined unitary actors and don’t face factional or regional variations in their preferences. Boix (1999), CIS (2010), and Calvo assume the uniformity of parties while Ahmed and Boix (2010) discuss intra-party differences. BDI, A&J, CIS (2007) focus on countries and assume that party preferences are uniform across all national parties. BDI believe that all parties shared the public’s desire for more equality; A&J claim that all parties faced the same level of uncertainty; and CIS (2007) assume that socialists and conservatives were equally concerned about labor market cooperation. The last three arguments commit the fallacy of division because they assume that what holds for a country also holds for its parties, and that parties have no preferences independent of those generated by broad national structures. CIS recognize this fallacy and correct it in their 2010 restatement, in which they shift the level of analysis down to the preferences of parties and interest groups. Finally, A&G differentiate between bourgeois and socialist parties, and assume their preferences are uniform across countries. However, they fail to empirically validate this claim or to disprove the contrary evidence offered by Boix, Calvo, Ahmed and Penadés (2008). Overall, the inattention to the units of analysis produces significant research errors in the case of A&G, CIS(2007), A&J, and BDI, and it raises modest doubts about Boix (1999), Calvo, and CIS (2010) conclusions. Only Ahmed (2013) and Boix (2010) are sufficiently transparent to give the reader confidence in the inferences drawn from parties’ preferences.

2.4.3. Test Construction

The robustness of causal inferences results from a three-cornered fight between a theory, its rivals and evidence, but the actual construction of tests vary significantly in how symmetrically they use evidence to adjudicate between test and alternative hypotheses. Table 13 captures this symmetry and resulting test strength by analyzing three aspects of test construction. First, it explores how many variables used in previous tests a particular article replicates. Such replications are free-standing tests that slightly modify earlier tests or case selection to assess the robustness of earlier findings. Typically, replications focus on alternative explanations and try to discount earlier test results to increase confidence in results of the new theory. Second, control variables incorporate elements of alternative theories in the test of the default theory rather than constructing two independent, parallel tests. Designating the prediction of an alternative theory a control variable implies an implicit prediction that this variable is unlikely to matter or, if so, to a far lesser degree, than the variables predicted by the test theory. In short, control variables test for potential confounding effects. Third, null variables differ from control variables only in that the tester claims that a prior prediction of an alternative theory won’t have any causal effect. Scholars designate variables as null variables by pointing to failed replications or theoretical flaws. In short, null variables have the theoretical status of noise.

The PR literature does not discuss the range of test constructions it entertained and why it selected one construction over another. But despite this lack of discussion, it is easy to explicate each contributions’ actual test construction. As Table 13 shows, the nine contributions varied in the type and strength of their test constructions. A&G constructed the weakest test given their failure to to evaluate any alternative hypothesis. A&J, CIS (2007) and Calvo built very strong tests that replicate, control, and make null predictions. BDI and Boix (2010) advance moderately strong tests. CIS (2010) replicated their earlier theory in the form of detailed, case study. Given this expressed process tracing goal, the absence of controls or null predictions is not a problem. Ahmed’s historical analysis of Belgium, UK, and the US make it difficult to use control variables, and lessened the utility of replication. She therefore focused on identifying theoretical flaws in theoretical arguments and translating them into null predictions.

2.4.4. Temporal Transparency

Table 14 shows that the PR literature greatly varies in the attention it pays to three important temporal considerations: boundary conditions, temporal causal structure, and symmetry of causation. Boix (1999, 610, 622) is the only one discussing boundary conditions, albeit in an inconsistent fashion. He begins his analysis by wanting to generalize Stein Rokkan’ left threat thesis beyond first wave democratizations, but then concludes limiting its application to democratization that already have well established party systems. (i.e. first wave democratization) BDI, A&G, CIS (2007, 2010), Boix 2010 imply temporal boundary conditions by limiting their theoretical claims to the period between 1880 and 1920. A&G are even a bit more explicit by differentiating between PR adoptions before and after the World War I. Calvo and A&J eschew any boundary conditions. They use theories that explain post-communist electoral system choices without addressing potential contextual differences with 1890-1920 period. There is even less attention to the temporal structure and causal symmetry of their arguments. With the exception of Ahmed’s subsequent analysis in her book (Ahmed 2013) and Boix very partially, the remaining seven works all assume a short/short temporal structure and pay no attention to causal reversals.

How consequential is this lack of temporal transparency? Temporal transparency is intrinsically desirable because it increases confidence in conclusions. It matters particularly when the temporal attributes of the cases deviate from those hypothesized by a theory. Concretely, the limited temporal transparency of the PR literature matters to the extent that temporal structure and causal symmetry of the cases deviate from the short/short structure and asymmetrical causation the theory stipulates. A quick look at the twenty or so cases used in the literature shows a considerable divergence on this score.

Table 15 assesses the temporal structure by asking how proximate in time the franchise expansion, an important independent variable in all the theories, was to the eventual adoption of a PR law. It also evaluates the time horizon of the dependent variable as well as causal symmetry by analyzing how many PR bills failed prior to the adoption of a PR law and how many PR laws were reversed. The time horizon becomes longer the more failed bills precede an eventual PR law and the more often those laws are subsequently overturned.

Table 15 underscores important incongruities between the theories’ temporal structures and that of the cases. There is considerable variation in the time horizons of the dependent variables. Only Sweden, Finland, Netherlands, Austria, Germany and to a lesser extent Belgium[footnoteRef:5] experienced one-shot, short-term PR adoption in which PR was adopted either right after the adoption of the first PR bill or shortly thereafter. By contrast, the time horizon of PR adoption in Denmark, Iceland, Norway, Switzerland, and France was far longer than in the other cases. Switzerland’s 1918 adoption was preceded by two failed national PR referenda as well as various subnational PR reforms. France adopted a weak PR law in 1919 that was preceded by two unsuccessful PR bills in 1906 and 1912. The PR law itself was subsequently reversed in 1929. And in three Nordic countries, the franchise was extended well before PR was introduced. [5: Two partial PR bills failed in 1893 and 1899 before a third bill finally became law later in 1899. (Carstairs 1980, 50–59)]

The timing of franchise expansion relative to PR adoption varied significantly as well thus undermining the implicit theoretical claims of a short, and hence causally insignificant lag between these two events. These time lags have important theoretical implications for the various explanations that remain inadequately acknowledged.

Ahmed is the only one acknowledging how the timing of the franchise affects the radicalism of the left and thereby also the PR preferences of the right. CIS argument is so structural that the franchise timing has little impact on their actors’ PR preferences. However, for A&J, BDI, Calvo and Boix, the franchise timing has important, unacknowledged theoretical implications. A&J discuss the role that uncertainty plays in shaping actors’ preferences. The long time lag between franchise expansion and PR adoption in eight countries must have significantly reduced the importance of uncertainty. Presumably, actors in those countries had a pretty clear idea about the party and their competitors’ electoral strength. In BDI’s case, the overlooked time lag raises the question of why the same ideas that they argue motivated both the franchise and PR were politically decisive at such different points in time. This same question also poses itself for those countries (Switzerland, Belgium, France, and Italy) in which the female franchise was adopted very late. In Boix’ argument (1999, 2010), the long time lag between franchise and PR reforms reduced the left threat because the left grew gradually, was less radicalized by disenfranchisement, and was more prone to cooperate with liberals. (Marks et al. 2009) In Calvo’s case, countries with an early franchise expansion experienced a much slower change in the geographic efficiency of their votes thus allowing their parties to avail themselves of small-scale, incremental districting changes to adapt to the changes in electoral geography. Such changes might not have been as politically feasible in instances of late and rapid change in electoral geography through sudden franchise expansion; in such cases a larger-scale solution such as adopting PR might have been called for. (Ahmed 2013, 16)

Conclusion: Aligning Epistemology and Publishing

The push for greater research transparency can be viewed as part of a broader epistemological shift away from logical positivism’s narrow focus on the final, data-centric, and technical testing stage of social inquiry towards the epistemology of scientific realists who place those test results in a broader, more iterative context that also includes discovery heuristics, theorizing, test construction, and multiple research cycles. (Abbott 2004; B&C 2014, 11; Bunge 1996) This paper’s effort to broaden the existing DA-RT guidelines is part of this broader push because footnotes complement data files by extending transparency considerations from testing to pre-testing stages of social inquiry. This effort to broaden transparency guidelines merely is a first step; footnotes and process tracing do not exhaust all the possible transparency considerations and don’t encompass criteria used when working with human subjects. Much more work still needs to be done to make transparency considerations fully pluralistic and to reach a genuinely broad consensus. Towards this long-term goal, this paper permits three interim conclusions.

First, the relative ease of fitting the informal transparency practices of footnoting into the formal DA-RT transparency categories underscores that those categories are ecumenical enough to accommodate a wide range of methodologies. There are no reasons to believe that those categories couldn’t also accommodate the transparency practices involving human subject research. (Bleich and Pekkanen 2013; Duneier 2011) The charge that DA-RT is inherently anti-pluralistic thus has to be rejected.

Second, the various transparency practices differ in their pay-offs and thus only marginally support the realists’ broadside that all of DA-RT is little more than epistemological red-tape. Broadly speaking, analytical transparency produced the highest pay-offs, data access the least, with production transparency falling somewhere in between. We also found variations in the pay-offs within each category.

Third, DA-RT’s pluralist potential and the pay-off of analytical transparency point to a major misalignment between the epistemological benefits of research transparency and the incentive structures of the publishing process. On this point, the realists’ claim that publishing process creates powerful camouflaging incentives is correct. Sky-high rejection rates, at times unpredictable and very lengthy peer reviews, shrinking word limits, minimizing footnoting, treating all hypotheses and tests as being created equally, and the back-grounding of ontological assumptions all create an intellectual and logistical environment that is inhospitable to research transparency. It could even be argued that this environment still reflects very much a logical positivist epistemology and its narrow, strictly test-centric concept of research transparency.

If DA-RT is to succeed, it has to take better account of this misalignment. The disproportionate attention it pays to data access and relative inattention to analytical transparency suggests that it is focusing on the element of research transparency where the misalignment is the smallest but where also the benefits are the most negligible. And in under-estimating this misalignment, it is placing the burdens on transparency on individual scholars without acknowledging the broader professional and logistical obstacles they face. This might explain the significant pushback it is receiving from many different quarters. Other disciplines have been more attentive to this misalignment and have begun to experiment with new publications formats. Sociological Science or Public Library of Science (PLoS), for example, provide quicker turn-around, more generous word limits, discourage “purely symbolic citations” (aka pageless footnotes), invite submission from various stages of the research cycle (not just standard hypothesis testing), and reduce the importance of pre-publication quality control by complementing it with post-publication online discussion and a broad range of impact factors. J-STOR and Hypothes-is are exploring the use social annotation to encourage micro-targeted, post-publication discussion of scholarly works. While many of these innovations are still very recent and their merits still uncertain, they at least point a way forward to better aligning the desire for more research transparency with the broader incentives of the production of knowledge.

1

Bibliography:

Abbott, Andrew. 2004. Methods of Discovery: Heuristics for the Social Sciences. New York: W.W. Norton.

Ahmed, Amel. 2013. Democracy and the Politics of Electoral System Choice: Engineering Electoral Dominance. Cambridge University Press.

Alesina, Alberto, and Edward L. Glaeser. 2004. Fighting Poverty in the US and Europe: a World of Difference. Oxford: Oxford University Press.

Andrews, Josephine T., and Robert W. Jackman. 2005. “Strategic Fools: Electoral Rule Choice under Extreme Uncertainty.” Electoral Studies 24(1): 65–84.

APSA. 2016. “Guidelines for Data Access and Research Transparency for Qualitative Research in Political Science.” Comparative Politics Newsletter 26(1): 13–21.

Beach, Derek, and Rasmus Brun Pedersen. 2013. Process-tracing Methods: Foundations and Guidelines. Ann Arbor: University of Michigan Press.

Becker, Howard. 2007. Telling About Society. University of Chicago Press.

Bennett, Andrew, and Jeffrey Checkel. 2014. “Process Tracing: from Philosophical Roots to Best Practices.” In Process Tracing, eds. Andrew Bennett and Jeffrey Checkel. New York: Cambridge University Press, p. 3–38.

Benoit, Kenneth. 2004. “Models of Electoral System Change.” Electoral Studies 23(3): 363–389.

Blais André, Dobrzynska Agnieszka, and Indridason Indridi H. 2005. “To Adopt or Not to Adopt Proportional Representation: The Politics of Institutional Choice.” British Journal of Political Science 35(1): 182–190.

Bleich, Erik, and Robert Pekkanen. 2013. “How to Report Interview Data.” In Interview Research in Political Science, ed. Layna Moseley. Ithaca: Cornell University Press, p. 95–116.

Boix, Carles. 2010. “Electoral Markets, Party Strategies, and Proportional Representation.” American Political Science Review 104(2): 404–413.

Boix, Carles. 1999. “Setting the Rules of the Game: the Choice of Electoral Systems in Advanced Democracies.” American Political Science Review 93(3): 609–24.

Bunge, Mario. 1996. Finding Philosophy in Social Science. Yale University Press.

Büthe, Tim, and Alan Jacobs, eds. 2015. “Symposium: Transparency in Qualitative and Multi-Method Research.” Qualitative and Multi-Method Research 13(1).

Calvo, Ernesto. 2009. “The Competitive Road to Proportional Representation: Partisan Biases and Electoral Regime Change under Increasing Party Competition.” World Politics 61(2): 254–295.

Carstairs, Andrew McLaren. 1980. 22 A Short History of Electoral Systems in Western Europe. Routledge.

Colomer, Josep M. 2007. “On the Origins of Electoral Systems and Political Parties.” Electoral Studies 26(2): 262–273.

Cusack, Thomas, Torben Iversen, and David Soskice. 2010. “Coevolution of Capitalism and Political Representation: The Choice of Electoral Systems.” American Political Science Review 104(2): 393–403.

Cusack, Thomas R., Torben Iversen, and David Soskice. 2007. “Economic Interests and the Origins of Electoral Systems.” American Political Science Review 101(3): 373.

Daston, Lorraine. 1991. “Marvelous Facts and Miraculous Evidence in Early Modern Europe.” Critical Inquiry 18(1): 93–124.

Desch, Michael. 2015. “Technique Trumps Relevance: The Professionalization of Political Science and the Marginalization of Security Studies.” Perspectives on Politics 13(2): 377–393.

Duneier, Mitchell. 2011. “How Not to Lie with Ethnography.” Sociological Methodology 41(1): 1–11.

Elman, Colin, and Diana Kapiszewski. 2014. “Data Access and Research Transparency in the Qualitative Tradition.” PS: Political Science & Politics 47(1): 43–47.

Evans, Richard. 1997. In Defense of History. New York: WW Norton.

Evera, Stephen Van. 1997. Guide to Methods for Students of Political Science. Cornell University Press.

Fairfield, Tasha. 2016. “Reflections on Analytical Transparency in Process Tracing.” Comparative Politics Newsletter 26(1): 41–47.

Fanelli, Daniele, and Wolfgang Glänzel. 2013. “Bibliometric Evidence for a Hierarchy of the Sciences” ed. Vincent Larivière. PLoS ONE 8(6): e66938.

Fischer, David Hackett. 1970. Historians’ Fallacies. Toward a Logic of Historical Thought. New York: Harper & Row.

Fischhoff, Baruch. 1982. “For those Condemned to Study the Past: Heuristics and Biases in Hindsight.” In Judgment under Uncertainty: Heuristics and Biases, eds. Daniel Kahneman, Paul Slovic, and Amos Tversky. Cambridge: Cambridge University Press, p. 335–351.

Gaddis, John Lewis. 2002. The Landscape of History. Oxford: Oxford University Press.

Gary, Marks, Heather Mbaye, and Kim Hyung Min. 2009. “Radicalism or Reformism? Socialist Parties before World War I.” American Sociological Review 74(4): 615–635.

Geddes Barbara. 2003. Paradigms and Sand Castles: Theory Building and Research Design in Comparative Politics. Ann Arbor: Michigan University Press.

Gerring, John. 2007. Case Study Research. Principles and Practices. Cambridge: Cambridge University Press.

Gerring, John. 2012. Social Science Methodology: 2nd Edition. New York: Cambridge University Press.

Golden, Matt, and Sona Golden, eds. 2016. “Symposium: Data Access and Research Transparency.” Comparative Politics Newsletter 26(1).

Grafton, Anthony. 1999. The Footnote: A Curious History. Harvard University Press.

Hall, Peter. 2003. “Aligning Ontology and Methodology in Comparative Politics.” In Comparative Historical Analysis in the Social Sciences, eds. James Mahoney and Dietrich Rueschmeyer. Cambridge: Cambridge University Press, p. 373–406.

Hall, Peter. 2016. “Transparency, Research Integrity, and Multiple Methods.” Comparative Politics Newsletter 26(1 (Spring)): 28–32.

Howell, Martha, and Walter Prevenier. 2001. From Reliable Sources: An Introduction to Historical Methods. Ithaca: Cornell University Press.

Isaac, Jeffrey C. 2015. “For a More Public Political Science.” Perspectives on Politics 13(2): 269–283.

Iversen, Torben, and David Soskice. 2006. “Electoral Institutions and the Politics of Coalitions.” American Political Science Review 100(2): 165–181.

Jackson, Brooks, and Kathleen Hall Jamieson. 2007. unSpun: Finding Facts in a World of Disinformation. Random House.

Jacobs, Alan. 2014. “Process Tracing the Effects of Ideas.” In Process Tracing, eds. Andrew Bennett and Jeffrey Checkel. New York: Cambridge University Press, p. 40–73.

King, Gary, Robert Keohane, and Verba Verba. 1994. Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton: Princeton University Press.

Kittel, Bernhard. 2006. “A Crazy Methodology? On the Limits of Macro-quantitative Social Science Research.” International Sociology 21(5): 647–677.

Lieberman, Evan. 2010. “Bridging the Qualitative-Quantitative Divide: Best Practices in the Development of Historically Oriented Replication Databases.” Annual Review of Political Science 13: 37–59.

Lieberman, Evan S. 2016. “Can the Biomedical Research Cycle be a Model for Political Science?” Perspectives on Politics.

Lieberson, Stanley. 1985. Making It Count: The Improvement of Social Theory and Research. Berkeley: University of California Press.

Lupia, Arthur, and Colin Elman. 2014. “Openness in Political Science: Data Access and Research Transparency.” PS: Political Science & Politics 47(1): 19–42.

Lustick, Ian. 1996a. “History, Historiography, and Political Science: Multiple Historical Records and the Problem of Selection Bias.” The American Political Science Review 90(3): 605–618.

Lustick, Ian. 1996b. “Read My Footnotes.” ASPA-CP Newsletter 7(1): 6 & 10.

Moravcsik, Andrew. 2010. “Active Citation: A Precondition for Replicable Qualitative Research.” PS: Political Science and Politics 43(1): 29.

Pierson, Paul. 2003. “Big, Slow-Moving and Invisible: Macrosocial Proesses in the Study of Comparative Politics.” In Comparative Historical Analysis in the Social Sciences, eds. James Mahoney and Dietrich Rueschmeyer. Cambridge: Cambridge University Press, p. 177–207.

Rodden Jonathan. Why did Western Europe Adopt Proportional Representation? A Political Geography Explanation. about:blank.

Rokkan, Stein. 1968. “Elections.” In International Encyclopedia of the Social Sciences, ed. David Sills. New York: MacMillan, p. 6–19.

Rueschemeyer, Dietrich. 2003. “Can One or a Few Cases Yield Theoretical Gains?” In Comparative Historical Analysis in the Social Sciences, eds. Dietrich Rueschmeyer and James E. Mahon. , p. 305–336.

Schwartz-Shea, Peregrine, and Dvora Yanow. 2016. “Legitimizing Political Science or Splitting the Discipline? Reflections on DA-RT and the Policy-making Role of a Professional Association.” Politics & Gender 12(3): 1–19.

Simkin, Mikhail, and Vwany Roychowdhury. 2006. “Do you sincerely want to be cited? Or: read before you cite.” Significance 3(4): 179–181.

Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn. 2011. “False-Positive Psychology Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22(11): 1359–1366.

Snyder, Jack. 2014. “Active Citation: In Search of Smoking Guns or Meaningful Context?” Security Studies 23(4): 708–714.

Steel, C. M. 1996. “Read before you cite.” The Lancet 348(9021): 144.

Stinchcombe, Arthur L. 1987. Constructing Social Theories. University of Chicago Press.

Trachtenberg, Marc. 2006. The Craft of International History: a Guide to Method. Princeton: Princeton University Press.

Trachtenberg, Marc. 2015. “Transparency in Practice: Using Written Sources.” Qualitative and Multi-Method Research Newsletter 13(1): 13–17.

Walt, Stephen, and John J. Mearsheimer. 2013. “Leaving Theory Behind: Why Hypothesis Testing Has Become Bad for IR.” European Journal of International Relations 19(3): 427–57.

Wheelan, Charles. 2013. Naked Statistics. New York: W.W. Norton.

Yom, Sean. 2015. “From Methodology to Practice Inductive Iteration in Comparative Research.” Comparative Political Studies 48(5): 616–644.