Validating Wordscores
-
Upload
bastiaan-bruinsma -
Category
Science
-
view
124 -
download
0
Transcript of Validating Wordscores
Validating Wordscores
Bastiaan Bruinsma Kostas Gemenis
Universiteit Twente
5th EPSA General Conference, Vienna, 25-27 June 2015
Bruinsma, Gemenis Validating Wordscores
Computer assisted methods for text analysis
analyzing massive collections of text has been essentially impossible for all but the most well-fundedprojects.
We show how automated content methods can make possible the previously impossible in pol-itical science: the systematic analysis of large-scale text collections without massive fundingsupport. Across all subfields of political science, scholars have developed or imported methodsthat facilitate substantively important inferences about politics from large text collections. Weprovide a guide to this exciting area of research, identify common misconceptions and errors,and offer guidelines on how to use text methods for social scientific research.
We emphasize that the complexity of language implies that automated content analysis methodswill never replace careful and close reading of texts. Rather, the methods that we profile here arebest thought of as amplifying and augmenting careful reading and thoughtful analysis. Further,automated content methods are incorrect models of language. This means that the performance ofany one method on a new data set cannot be guaranteed, and therefore validation is essential whenapplying automated content methods. We describe best practice validations across diverse researchobjectives and models.
Before proceeding we provide a road map for our tour. Figure 1 provides a visual overview ofautomated content analysis methods and outlines the process of moving from collecting texts toapplying statistical methods. This process begins at the top left of Fig. 1, where the texts are initiallycollected. The burst of interest in automated content methods is partly due to the proliferation ofeasy-to-obtain electronic texts. In Section 3, we describe document collections which political sci-entists have successfully used for automated content analysis and identify methods for efficientlycollecting new texts.
With these texts, we overview methods that accomplish two broad tasks: classification andscaling. Classification organizes texts into a set of categories. Sometimes researchers know thecategories beforehand. In this case, automated methods can minimize the amount of laborneeded to classify documents. Dictionary methods, for example, use the frequency of key wordsto determine a document’s class (Section 5.1). But applying dictionaries outside the domain forwhich they were developed can lead to serious errors. One way to improve upon dictionaries are
Fig. 1 An overview of text as data methods.
Justin Grimmer and Brandon M. Stewart2
at Stanford University on January 22, 2013
http://pan.oxfordjournals.org/D
ownloaded from
Bruinsma, Gemenis Validating Wordscores
Wordscores
I Originally proposed by Laver, Benoit & Garry (2003)
I Popular tool (869 citations on Google Scholar)
I Developed for political manifestos, but also used to study:I Party mergers, electoral coalitions, policy preferences,
speeches, reports from US state lotteries, Chinese newspaperarticles, public statements by US Senators, open-endedquestions ...
I Attempts at validation are rather limited
Bruinsma, Gemenis Validating Wordscores
How Wordscores Works
Bruinsma, Gemenis Validating Wordscores
Previous attempts at validation
I Mostly against CMP data though Benoit & Laver (2007)advise against this
I Only assess criterion validity
I Only assess ordinal placement (Hjorth et al. 2015)
I Only use Spearman’s ρ or Pearson’s r (and thus noassessment of systematic measurement error)
Bruinsma, Gemenis Validating Wordscores
Replication of the original Laver et al. article
Table 1: Replication of the original scores
Number of PartiesStata Version 5 parties 7 parties
0.36EC
0 5 10 15 20
SO
DL Labour FG FF PD
FFLabour
PD
FGDL
DL Labour FFFG PDSF
GreensEC
0 5 10 15 20
DL
Labour
FFFG
GreensSO
SF PD
Laver et al. (2003)
23-Jun-2009
EC
0 5 10 15 20
SO
Labour FG PDFF DL
DL Labour FFFG
PD
EC
0 5 10 15 20
SODL
Labour
FF
FG
PD
SFGreens
DL
LabourFF FG PD
SF
Greens
Laver et al. (2003) Replication Material
Bruinsma, Gemenis Validating Wordscores
Hjorth et al. validation
ws_
rank
exp
ws_
rank
exp
ws_
rank
exp
ws_
rank
exp
low high low high low high
low high low high
low high low high low high
low high low high
low high
1945 1950 1953 1957 1960
1964 1966 1968 1971 1973
1977 1979 1981 1984 1987
1988 1990 1994 1998 2001
2005 2007
Bruinsma, Gemenis Validating Wordscores
Study Design
I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database
I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert
Survey (2003-2004), Euromanifestos Project (2004)
I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),
Euromanifestos Project (2009)
I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of
Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2
transformations = 600 analyses
Bruinsma, Gemenis Validating Wordscores
Study Design
I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database
I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert
Survey (2003-2004), Euromanifestos Project (2004)
I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),
Euromanifestos Project (2009)
I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of
Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2
transformations = 600 analyses
Bruinsma, Gemenis Validating Wordscores
Study Design
I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database
I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert
Survey (2003-2004), Euromanifestos Project (2004)
I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),
Euromanifestos Project (2009)
I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of
Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2
transformations = 600 analyses
Bruinsma, Gemenis Validating Wordscores
Study Design
I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database
I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert
Survey (2003-2004), Euromanifestos Project (2004)
I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),
Euromanifestos Project (2009)
I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of
Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2
transformations = 600 analyses
Bruinsma, Gemenis Validating Wordscores
Study Design
I DocumentsI Using 2004 Euromanifestos to score 2009 EuromanifestosI Euromanifestos obtained from the Manifesto Project Database
I Reference scoresI Chapel Hill Expert Study (2002), Benoit & Laver Expert
Survey (2003-2004), Euromanifestos Project (2004)
I ComparisonI Chapel Hill Expert Study (2010), EU Profiler (2009),
Euromanifestos Project (2009)
I AnalysisI Use Lin’s Concordance Correlation Coefficient instead of
Spearman’s ρ or Pearson’s rI 25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2
transformations = 600 analyses
Bruinsma, Gemenis Validating Wordscores
Types of validity
Following Carmines & Zeller (1979):
I Content ValidityI Does the method represent all facets of a construct?
I Construct ValidityI Does the method correlate with other measures reflecting the
same concept?
I Criterion ValidityI Does the method behave as expected within a given theoretical
context?
Bruinsma, Gemenis Validating Wordscores
Types of validity
Following Carmines & Zeller (1979):
I Content ValidityI Does the method represent all facets of a construct?
I Construct ValidityI Does the method correlate with other measures reflecting the
same concept?
I Criterion ValidityI Does the method behave as expected within a given theoretical
context?
Bruinsma, Gemenis Validating Wordscores
Types of validity
Following Carmines & Zeller (1979):
I Content ValidityI Does the method represent all facets of a construct?
I Construct ValidityI Does the method correlate with other measures reflecting the
same concept?
I Criterion ValidityI Does the method behave as expected within a given theoretical
context?
Bruinsma, Gemenis Validating Wordscores
Types of validity
Following Carmines & Zeller (1979):
I Content ValidityI Does the method represent all facets of a construct?
I Construct ValidityI Does the method correlate with other measures reflecting the
same concept?
I Criterion ValidityI Does the method behave as expected within a given theoretical
context?
Bruinsma, Gemenis Validating Wordscores
Content validity for EU Integration
0.5
11
.52
2.5
De
nsity
0 .5 1word relevance (mean)
BNP
01
23
4
De
nsity
0 .2 .4 .6 .8 1word relevance (mean)
CONSERVATIVES
02
46
81
0
De
nsity
0 .2 .4 .6 .8 1word relevance (mean)
GREENS
02
46
De
nsity
0 .2 .4 .6 .8 1word relevance (mean)
LABOUR
02
46
8
De
nsity
0 .2 .4 .6 .8 1word relevance (mean)
LIBDEM
02
46
8
De
nsity
0 .2 .4 .6 .8 1word relevance (mean)
PC
02
46
8
De
nsity
0 .2 .4 .6 .8 1word relevance (mean)
SNP
0.5
11
.52
2.5
De
nsity
0 .5 1word relevance (mean)
UKIP
02
46
De
nsity
0 .2 .4 .6 .8 1word relevance (mean)
Total
Bruinsma, Gemenis Validating Wordscores
Construct validity
LBG
MV
Tra
nsfo
rmat
ion
0 .2 .4 .6 .8 1McFadden's R Squared
BL CHES EMPReference scores from
LBG
MV
Tra
nsfo
rmat
ion
0 .2 .4 .6 .8 1Count R Squared
BL CHES EMPReference scores from
Bruinsma, Gemenis Validating Wordscores
Criterion validity
CH
ES
EU
PE
MP
Co
mp
are
d t
o
0 .2 .4 .6 .8 1Concordance Correlation Coefficient
LBG Transformation − Per Country Rescaling
CH
ES
EU
PE
MP
Co
mp
are
d t
o
0 .2 .4 .6 .8 1Concordance Correlation Coefficient
LBG Transformation − Whole Dimension Rescaling
CH
ES
EU
PE
MP
Co
mp
are
d t
o
0 .2 .4 .6 .8 1Concordance Correlation Coefficient
MV Transformation − Per Country Rescaling
CH
ES
EU
PE
MP
Co
mp
are
d t
o
0 .2 .4 .6 .8 1Concordance Correlation Coefficient
MV Transformation − Whole Dimension Rescaling
EU Integration Dimension
BL CHES EMP
Reference scores from
Bruinsma, Gemenis Validating Wordscores
Conclusion
I No serious validation of Wordscores up till now
I This validation found it lacking on content, construct andcriterion validity
I Wordscores should not be used to estimate parties’ policypositions using electoral manifestos as reference and virgintexts
Bruinsma, Gemenis Validating Wordscores
Outlook
I Wordscores might still be useful in other applications wherethe assumptions of ideal point estimation for words might beapproximated
I However, a case-by-case validation should be applied
Bruinsma, Gemenis Validating Wordscores
Validating Wordscores
Bastiaan Bruinsma Kostas Gemenis
Universiteit Twente
5th EPSA General Conference, Vienna, 25-27 June 2015
Bruinsma, Gemenis Validating Wordscores