Beyond Binary Labels: Political Ideology Prediction of Twitter...

32
Beyond Binary Labels: Political Ideology Prediction of Twitter Users Daniel Preot ¸iuc-Pietro Bloomberg LP Joint work with Ye Liu (NUS), Daniel J Hopkins (Political Science), Lyle Ungar (CS) while at the University of Pennsylvania 2 August 2017

Transcript of Beyond Binary Labels: Political Ideology Prediction of Twitter...

Page 1: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Beyond Binary Labels: Political IdeologyPrediction of Twitter Users

Daniel Preotiuc-PietroBloomberg LP

Joint work with Ye Liu (NUS), Daniel J Hopkins (Political Science), LyleUngar (CS) while at the University of Pennsylvania

2 August 2017

Page 2: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Motivation

User attribute prediction from text is successful:

I Age (Rao et al. 2010 ACL)

I Gender (Burger et al. 2011 EMNLP)

I Location (Eisenstein et al. 2010 EMNLP)

I Personality (Schwartz et al. 2013 PLoS One)

I Impact (Lampos et al. 2014 EACL)

I Political Orientation (Volkova et al. 2014 ACL)

I Mental Illness (Coppersmith et al. 2014 ACL)

I Occupation (Preotiuc-Pietro et al. 2015 ACL)

I Income (Preotiuc-Pietro et al. 2015 PLoS One)

... and useful in many applications.

Page 3: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Political Ideology & Text

Hypothesis:

Political ideology of a user is disclosed through language use

I partisan political mentions or issues

I cultural differences

Page 4: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Political Ideology & Text

Previous CS/NLP research used data sets with user labelsidentified through:

1. User descriptions

H1 Users are far more likely to be politically engaged

Page 5: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Political Ideology & Text

2. Partisan Hashtags

H2 The prediction problem was so far over-simplified

Page 6: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Political Ideology & Text

3. Lists of Conservative/Liberal users

H3 Neutral users

Page 7: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Political Ideology & Text

4. Followers of partisan accounts

H4 Differences in language use exist between moderate andextreme users

Page 8: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Data

I Political ideologyI specific of country and cultureI our use case is US politics (similar to all previous work)I the major US ideology spectrum is Conservative – LiberalI seven point scale

Page 9: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Data

We collect a new data set:

I 3.938 users (4.8M tweets)I public Twitter handle with >100 posts

Political ideology is reported through an online survey

I only way to obtain unbiased ground truth labels (Flekova et al.

2016 ACL, Carpenter et al. 2016 SPPS)

I additionally reported age, gender and other demographics

Page 10: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Data

I Data available at preotiuc.roI full data for research purposesI aggregate for replicability

I Twitter Developer Agreement & Policy VII.A4”Twitter Content, and information derived from Twitter Content, may not be

used by, or knowingly displayed, distributed, or otherwise made available to

any entity to target, segment, or profile individuals based on [...] political

affiliation or beliefs”

I Study approved by the Internal Review Board (IRB) of theUniversity of Pennsylvania

Page 11: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Class Distribution

401453

696

195

401453

696

501

692

594

0

250

500

750

1000

Page 12: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Data

For comparison to previous work, we collect a data set:

I 13.651 users (25.5M tweets)I follow liberal/conservative politicians on Twitter

Page 13: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Hypotheses

H1 Previous studies used users far more likely to be politicallyengaged

H2 The prediction problem was so far over-simplified

H3 Neutral users can be identified

H4 Differences in language use exist between moderate andextreme users

Page 14: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Engagement

H1 Previous studies used users far more likely to be politicallyengaged

Manually coded:

I Political words (234)I Political NEs: mentions of politician proper names (39)I Media NEs: mentions of political media sources and

pundints (20)

Page 15: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Engagement

Data set obtained using previous methods

2.64 2.95

0.73

0.79

0.11

0.18

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00Political word usage across

user groups

Media/Pundit Names

Politician Names

Political Words

Average percentage of political word usage

Page 16: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Engagement

Our data set

2.64 0.76 0.55 0.42 0.36 0.46 0.51 0.76 2.95

0.73

0.24

0.140.07 0.07

0.09 0.12

0.19

0.79

0.11

0.03

0.03

0.02 0.020.03

0.03

0.04

0.18

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00Political word usage across

user groups

Media/Pundit Names

Politician Names

Political Words

Average percentage of political word usage

Page 17: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Engagement

Our data set

2.64 0.76 0.55 0.42 0.36 0.46 0.51 0.76 2.95

0.73

0.24

0.140.07 0.07

0.09 0.12

0.19

0.79

0.11

0.03

0.03

0.02 0.020.03

0.03

0.04

0.18

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00Political word usage across

user groups

Media/Pundit Names

Politician Names

Political Words

Average percentage of political word usage

Page 18: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Engagement

Take aways:

I 3x more political terms for automatically identified userscompared to the highest survey-based scores

I almost perfectly symmetrical U-shape across all threetypes of political terms

I The difference between 1-2/6-7 is larger than 2-3/5-6

Page 19: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Hypotheses

H1 Previous studies used users far more likely to be politicallyengaged

H2 The prediction problem was so far over-simplified

H3 Neutral users can be identified

H4 Differences in language use exist between moderate andextreme users

Page 20: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Over-simplification

H2 The prediction problem was so far over-simplified

.891

.972 .976

.5

.6

.7

.8

.9

1.0

CvL

Topics Political Terms Domain Adaptation

ROC AUC, Logistic Regression, 10-fold cross-validation

Page 21: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Over-simplification

H2 The prediction problem was so far over-simplified

.891

.785

.972

.785

.976

.789

.5

.6

.7

.8

.9

1.0

CvL 1v7

Topics Political Terms Domain Adaptation

ROC AUC, Logistic Regression, 10-fold cross-validation

Page 22: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Over-simplification

H2 The prediction problem was so far over-simplified

.891

.785

.662

.972

.785

.679

.976

.789

.690

.5

.6

.7

.8

.9

1.0

CvL 1v7 2v6

Topics Political Terms Domain Adaptation

ROC AUC, Logistic Regression, 10-fold cross-validation

Page 23: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Over-simplification

H2 The prediction problem was so far over-simplified

.891

.785

.662

.581

.972

.785

.679

.590

.976

.789

.690

.625

.5

.6

.7

.8

.9

1.0

CvL 1v7 2v6 3v5

Topics Political Terms Domain Adaptation

ROC AUC, Logistic Regression, 10 fold-cross validation

Page 24: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Over-simplification

Predicting continuous political leaning (1 – 7)

.294 .286.300

.145

.256

.369

.00

.10

.20

.30

.40

Leaning

Unigrams LIWC Topics Emotions Political All

Pearson R between predictions and true labels, Linear Regression,10-fold cross-validation

Page 25: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Over-simplification

Seven-class classification

19.60%22.20%

24.20%26.20%

27.60%

0%

10%

20%

30%

Accuracy, 10-fold cross-validation

GR – Logistic regression with Group Lasso regularisation

Page 26: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Hypotheses

H1 Previous studies used users far more likely to be politicallyengaged

H2 The prediction problem was so far over-simplified

H3 Neutral users can be identified

H4 Differences in language use exist between moderate andextreme users

Page 27: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Neutral Users

H3 Neutral users can be identified

Words associated with eitherextreme conservative or liberal

Words associated with neutralusers

a aacorrelation strength

Correlations are age and gender controlled. Extreme groups arecombined using matched age and gender distributions.

Page 28: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Political Engagement

H3a There is a separate dimension of political engagement

Combine the classes into a scale: 4 – 3&5 – 2&6 – 1&7

.294

.165

.286

.149

.300

.169.145

.079

.256

.169

.369

.196

.00

.10

.20

.30

.40

Leaning Engagement

Unigrams LIWC Topics Emotions Political All

Pearson R between predictions and true labels, Linear Regression, 10fold-cross validation

Page 29: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Hypotheses

H1 Previous studies used users far more likely to be politicallyengaged

H2 The prediction problem was so far over-simplified

H3 Neutral users can be identified

H4 Differences in language use exist between moderate andextreme users

Page 30: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Moderate Users

H4 Differences between moderate and extreme users

Words associated with moderateliberals (5 and 6).

Words associated with extremeliberals (7).

relative frequency

a aacorrelation strength

Correlations are age and gender controlled

Page 31: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Take Aways

I User-level trait acquisition methodologies can generatenon-representative samples

I Political ideology:I Goes beyond binary classesI The problem was to date over-simplifiedI New data set available for researchI New model to identify political leaning and engagement

Page 32: Beyond Binary Labels: Political Ideology Prediction of Twitter Usersdanielpr/files/moderates17acl_pres.pdf · Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Questions?www.preotiuc.ro

wwbp.org