Validating Wordscores

Validating Wordscores

Bastiaan Bruinsma Kostas Gemenis

Universiteit Twente

5th EPSA General Conference, Vienna, 25-27 June 2015

Bruinsma, Gemenis Validating Wordscores

Computer assisted methods for text analysis

analyzing massive collections of text has been essentially impossible for all but the most well-fundedprojects.

We show how automated content methods can make possible the previously impossible in pol-itical science: the systematic analysis of large-scale text collections without massive fundingsupport. Across all subfields of political science, scholars have developed or imported methodsthat facilitate substantively important inferences about politics from large text collections. Weprovide a guide to this exciting area of research, identify common misconceptions and errors,and offer guidelines on how to use text methods for social scientific research.

We emphasize that the complexity of language implies that automated content analysis methodswill never replace careful and close reading of texts. Rather, the methods that we profile here arebest thought of as amplifying and augmenting careful reading and thoughtful analysis. Further,automated content methods are incorrect models of language. This means that the performance ofany one method on a new data set cannot be guaranteed, and therefore validation is essential whenapplying automated content methods. We describe best practice validations across diverse researchobjectives and models.

Before proceeding we provide a road map for our tour. Figure 1 provides a visual overview ofautomated content analysis methods and outlines the process of moving from collecting texts toapplying statistical methods. This process begins at the top left of Fig. 1, where the texts are initiallycollected. The burst of interest in automated content methods is partly due to the proliferation ofeasy-to-obtain electronic texts. In Section 3, we describe document collections which political sci-entists have successfully used for automated content analysis and identify methods for efficientlycollecting new texts.

With these texts, we overview methods that accomplish two broad tasks: classification andscaling. Classification organizes texts into a set of categories. Sometimes researchers know thecategories beforehand. In this case, automated methods can minimize the amount of laborneeded to classify documents. Dictionary methods, for example, use the frequency of key wordsto determine a document’s class (Section 5.1). But applying dictionaries outside the domain forwhich they were developed can lead to serious errors. One way to improve upon dictionaries are

Fig. 1 An overview of text as data methods.

Justin Grimmer and Brandon M. Stewart2

at Stanford University on January 22, 2013

http://pan.oxfordjournals.org/D

ownloaded from


Wordscores

I Originally proposed by Laver, Benoit & Garry (2003)

I Popular tool (869 citations on Google Scholar)

I Developed for political manifestos, but also used to study:I Party mergers, electoral coalitions, policy preferences,

speeches, reports from US state lotteries, Chinese newspaperarticles, public statements by US Senators, open-endedquestions ...

I Attempts at validation are rather limited


How Wordscores Works


Previous attempts at validation

I Mostly against CMP data though Benoit & Laver (2007)advise against this

I Only assess criterion validity

I Only assess ordinal placement (Hjorth et al. 2015)

I Only use Spearman’s ρ or Pearson’s r (and thus noassessment of systematic measurement error)


Replication of the original Laver et al. article

Table 1: Replication of the original scores

Number of PartiesStata Version 5 parties 7 parties

0.36EC

0 5 10 15 20

SO

DL Labour FG FF PD

FFLabour

PD

FGDL

DL Labour FFFG PDSF

GreensEC

0 5 10 15 20

DL

Labour

FFFG

GreensSO

SF PD

Laver et al. (2003)

23-Jun-2009

EC

0 5 10 15 20

SO

Labour FG PDFF DL

DL Labour FFFG

PD

EC

0 5 10 15 20

SODL

Labour

FF

FG

PD

SFGreens

DL

LabourFF FG PD

SF

Greens

Laver et al. (2003) Replication Material


Hjorth et al. validation

ws_

rank

exp

ws_

rank

exp

ws_

rank

exp

ws_

rank

exp

low high low high low high

low high low high

low high low high low high

low high low high

low high

1945 1950 1953 1957 1960

1964 1966 1968 1971 1973

1977 1979 1981 1984 1987

1988 1990 1994 1998 2001

2005 2007


Study Design

I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database

I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert

Survey (2003-2004), Euromanifestos Project (2004)

I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),

Euromanifestos Project (2009)

I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of

Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2

transformations = 600 analyses


Types of validity

Following Carmines & Zeller (1979):

I Content ValidityI Does the method represent all facets of a construct?

I Construct ValidityI Does the method correlate with other measures reflecting the

same concept?

I Criterion ValidityI Does the method behave as expected within a given theoretical

context?


Content validity for EU Integration

0.5

11

.52

2.5

De

nsity

0 .5 1word relevance (mean)

BNP

01

23

4

De

nsity

0 .2 .4 .6 .8 1word relevance (mean)

CONSERVATIVES

02

46

81

0

De

nsity


GREENS

02

46

De

nsity


LABOUR

02

46

8

De

nsity


LIBDEM

02

46

8

De

nsity


PC

02

46

8

De

nsity


SNP

0.5

11

.52

2.5

De

nsity

0 .5 1word relevance (mean)

UKIP

02

46

De

nsity


Total


Construct validity

LBG

MV

Tra

nsfo

rmat

ion

0 .2 .4 .6 .8 1McFadden's R Squared

BL CHES EMPReference scores from

LBG

MV

Tra

nsfo

rmat

ion

0 .2 .4 .6 .8 1Count R Squared

BL CHES EMPReference scores from


Criterion validity

CH

ES

EU

PE

MP

Co

mp

are

d t

o

0 .2 .4 .6 .8 1Concordance Correlation Coefficient

LBG Transformation − Per Country Rescaling

CH

ES

EU

PE

MP

Co

mp

are

d t

o


LBG Transformation − Whole Dimension Rescaling

CH

ES

EU

PE

MP

Co

mp

are

d t

o


MV Transformation − Per Country Rescaling

CH

ES

EU

PE

MP

Co

mp

are

d t

o


MV Transformation − Whole Dimension Rescaling

EU Integration Dimension

BL CHES EMP

Reference scores from


Conclusion

I No serious validation of Wordscores up till now

I This validation found it lacking on content, construct andcriterion validity

I Wordscores should not be used to estimate parties’ policypositions using electoral manifestos as reference and virgintexts


Outlook

I Wordscores might still be useful in other applications wherethe assumptions of ideal point estimation for words might beapproximated

I However, a case-by-case validation should be applied


Validating Wordscores

Bastiaan Bruinsma Kostas Gemenis

Universiteit Twente

5th EPSA General Conference, Vienna, 25-27 June 2015


Validating Wordscores

Science

Transcript of Validating Wordscores