Social media analysis with NLP - Carnegie Mellon...
Transcript of Social media analysis with NLP - Carnegie Mellon...
Social media analysiswith NLP
Michael Miller Yoder
28 April 2020
1
Overview
1. Motivation: language in social context
2
Overview
1. Motivation: language in social context
2. Examples of NLP approaches to modeling identity
3
Overview
1. Motivation: language in social context
2. Examples of NLP approaches to modeling identity
Effects of self-presentation on interactionin social media
4
Experiment 1
Overview
1. Motivation: language in social context
2. Examples of NLP approaches to modeling identity
Effects of self-presentation on interactionin social media
Portrayal of characters and relationshipsin narrative (fanfiction)
5
Experiment 1
Experiment 2
language embedded in social context
6
What types of social contexts is language used in?
7
What types of social contexts?
8
9
10
11
12
For NLP, what is language?
13
14
1990 2000 2010
statistical machine learning NLP
Penn Treebank
1987-1989
2020
15
news
16
news1987-1989
17
1990 2000 2010
statistical machine learning NLP neural NLP
Penn Treebank
1987-1989
BERT
2020
18
19
SOCIAL20
language
speakers audience
situations purposes
21
Penn Treebank
1987-1989
credit: Amir Zeldes, [Zeldes & Simonson 2016]
Typical rates in the secondary market : 8.65 % one month ; 8.65 % three months ; 8.55 % six months. BANKERS ACCEPTANCES : 8.52 % 30 days ; 8.37 % 60 days ; 8.15 % 90 days ; 7.98 % 120 days ; 7.92 % 150 days ; 7.80 % 180 days.
22
language is always embedded in social context
NLP + social science: applications
23
hate speech detection community norms
NLP + social science: applications
24
fairness and bias
Garg et al. 2017
media framing
https://criticalmediareview.wordpress.com/2015/10/19/what-is-media-framing/
NLP + social science: applications
25
dialectal NLP tools
Garg et al. 2017www.tes.com
Overview
1. Motivation: language in social context
2. Examples of NLP approaches to modeling identity
Effects of self-presentation on interactionin social media
Portrayal of characters and relationshipsin narrative (fanfiction)
26
Experiment 1
Experiment 2
27
28
29
Models of identity
identity
30
Critical identity approaches
“identity is the product rather than the source of linguistic and other semiotic practices … is social and cultural rather than primarily internal”
sociolinguistics
[Bucholtz and Hall 2005]
31
identity
Critical identity approaches
“identity is the product rather than the source of linguistic and other semiotic practices … is social and cultural rather than primarily internal”
sociolinguistics
[Bucholtz and Hall 2005]
32
identity
society, culture
Critical identity approaches
“As a shifting and contextual phenomenon, gender does not denote a substantive being”
gender studies
[Butler 1990]
33
Critical identity approaches34
(changing) identity
“As a shifting and contextual phenomenon, gender does not denote a substantive being”
gender studies
[Butler 1990]
society, culture
Critical identity approaches
“race and sex become grounded in experiences that actually represent only a subset of a much more complex phenomenon.”
critical race theory
[Crenshaw 1989]
35
(intersectional)identity
Critical identity approaches
“people have multiple identities connected not to their ‘internal states’ but to their performances in society”
discourse analysis
[Gee 2000]
36
identities
Computational identity approaches
“classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language”
computer science
[Rao et al. 2010]
37
identity
“a [deep neural network] can be used to identify sexual orientation from facial images”
computer vision
[Kosinski and Wang 2018]
38
identity
Computational identity approaches
Can we investigate the production of identity in language with computational models?
39
Avoid naturalizing structures of identity and further marginalizing those who don’t fit them (Butler 1990)
Discover how notions of identity are being reinforced/challenged/reinvented
40
?language + social
data y = f(x)
machine learning
1. Self-presentation effects on social media
Qinlan ShenCMU Language Technologies Institute
Alex CodaCMU Language Technologies Institute
Carolyn P. RoséCMU Language Technologies Institute
Yunseok JangU Michigan Computer Science & Eng
Yale SongMicrosoft Research
Kapil ThadaniYahoo Research
WebSci 2020
Explicit identity positioning
● Working identity definition: “social positioning of self and other” [Bucholtz & Hall 2010]
● How does the social positioning of self affect interaction on social media?
● Tumblr as a site with particular identity implications, as well as social interaction
42
43
44
Lyca / 25
Blog descriptions on Tumblr
45
● Free-form text bio boxes
● Labeling practices outside gender/sexuality binaries [Oakley 2016]
max | 18yo | she/they | girl with dreams | twerfs don't follow
andre | he/him | 22 | mexican ✨trans | too many fandoms
hey! annie, she/hers, love me, infj
Identity categories
ageethnicity/nationalityfandomsgenderinterestslocation
personality typepronounsrelationship statussexual orientationzodiac sign
46
fandoms: shipping, star wars, lotr, homestuckgender: woman, husband, mtf, nonbinaryage: 24, xviii, 35yo, nineteen
What effects of similarities and differences in self-positioning do we see on content propagation
in Tumblr?
47
What effects of similarities and differences in self-positioning do we see on content propagation
in Tumblr?
48
blog descriptions reblogging
Reblog prediction
● Reblog "opportunity"
49
follower
followee
post
followee
postsimilar time
Reblog prediction
● Reblog "opportunity"
● Learning to rank pairwise formulation
followee
post
50
followee
post
reblog
similar time
follower
Reblog prediction
● Reblog "opportunity"
● Learning to rank pairwise formulation she/her
25 | nyc
post
51
reylo fan
post
reblog
similar time
Data
Number of users 34,797
Reblog prediction instances 712,670
Timeframe June - Nov 2018
52
Control features
53
post hashtags
number of likes, comments
post type (text, photo, etc)
Identity features
54
22 male infj coffee 🌈 they/them 29 leo infj
FOLLOWERFOLLOWED
Identity features
55
22 male infj coffee 🌈 they/them 29 leo infj
FOLLOWERFOLLOWED
match: age
Identity features
56
22 male infj coffee 🌈 they/them 29 leo infj
FOLLOWERFOLLOWED
match: personality type
Identity features
57
they/them 29 leo infj
FOLLOWERFOLLOWED
mismatch: pronouns
22 male infj coffee 🌈
X
Identity features
58
22 male infj coffee 🌈 they/them 29 leo infj
FOLLOWERFOLLOWED
match: infj
Identity features
59
22 male infj coffee 🌈 they/them 29 leo infj
FOLLOWERFOLLOWED
followed: 22, follower: 29
Is there an effect?
60
Self-presentation labels are associated with content propagation
What is the nature of this effect?
● Establishing solidarity: categories and label matches were positively associated with reblogging
61
indie indie
sappho sappho
any sexual orientation any sexual orientation
What is the nature of this effect?
62
Features Likelihood of reblogging
Follower: presents pronounsFollowed: does not
↓
Both: cis or cishet ↑
Race/ethnicity label alignment ↑
Nationality label alignment none
What is the nature of this effect?
63
Features Likelihood of reblogging
Follower: gamingFollowed: manga
↑
Follower: memes Followed: history
↓
Conclusion
● Evidence for an association between explicit, self-presented identity information and content propagation
○ Most studies use only content and network features to predict content propagation [Naveed et al. 2011, Zhang et al. 2016,
Vosoughi et al. 2018]
● Users who presented labels that indicated shared interests or shared values were more likely to share each other’s content
64
2. Changes in portrayal of characters in narrative
65
Qinlan Shen
Luke Breitfeller
Carolyn P. Rosé
James Fiacco
Shefali GargEthan Xuanyue Yang
Huiming JinHariharan Muralidharan
Fanfiction
67
● Stories based on existing media [Fiesler et al. 2016]
● “Participatory culture” [Jenkins 2003]
Canon The original work fanfiction is based on
Ship Romantic relationship between characters
Fic A specific fanfiction story
Fanfiction
● “Queer female space” [Lothian et al. 2007]
68
Fanfiction
● “Queer female space” [Lothian et al. 2007]
○ queer pairings
69
M/M F/M F/F
Fanfiction
● “Queer female space” [Lothian et al. 2007]
○ queer pairings○ female characters
[Bamman & Milli 2016]
○ gender-swapping○ desire outside heterosexual,
cisgender norms 70
thed
cont
inuu
m.w
ord
pre
ss.c
om
How do fanfiction authors use language to shift character identities from canon?
71
1. Locate text that is relevant for characterization
2. Test ability to capture changes in relationship portrayal
3. Describe patterns in characterization shifts
Text extraction
72
github.com/michaelmilleryoder/fanfiction-nlpBased on BookNLP [Bamman et al. 2014]
73
● Word embeddings [Mikolov et al. 2013a] for social questions○ Stereotypes and bias in corpora [Garg et al. 2018]
○ Framing by different social groups [An et al. 2018)]
● Can word embeddings capture social framing of relationships in fanfiction?
Methods
Data
74
Harry Potter stories Archive of Our Own
>179k stories (as of 2018)
Characters
● Harry Potter● Hermione
Granger● Draco Malfoy● Ron Weasley● Ginny
Weasley
Pairings by popularity
● Draco/Harry● Hermione/Ron● Draco/Hermione● Ginny/Harry● Harry/Hermione● Harry/Ron
Prediction task
75
● Does the relationship match canon in being romantic/not romantic?
● False (relationship is changed) if
○ not romantic in canon and romantic in fanfiction or
○ romantic in canon and not romantic in fanfiction
Relationship representations
76
Harry wept at the sight of Hermione in the garden.
Ron looked down at his shoe. Troll bogeys. He would have to tell Harry about this.
Harry Hermione Harry Ron
● Weighted average of word embeddings in a 10-word window around character name mentions
77
Visualization
● Track changes in contextualized embeddings for character names across fics
○ Train RNN-based language model and take final hidden state as contextualized word representation [Peters et al. 2018]
78
Visualization
Hermione sat in the front of the classroom. She...
Fleur whistled softly. "Hermione! Come here...
[ 0.34 0.72 0.21 … ]
[ 0.89 0.06 0.53 … ]
79
80
81
Canon vector is close to the center of the fanfiction vectors: harry
Canon vector is on the edge of fanfiction vectors: draco, remus, sirius
Conclusion
82
● Word embedding approaches can capture types of character framing
○ See evidence of differences in characterization, relationships
● Differences often match known fanfiction trends
Conclusion
83
Computational models of identity in language
● Computational techniques to analyze and model the presentation of identity in discourse
● The effects of the choice of self-presentation (Experiment 1)
● How identities are represented in changing ways in narrative (Experiment 2)
84
language embedded in social context
85
Thank you!
86