Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics...

26
detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge Studio (VKS) Information Studies MySpace comments case study

Transcript of Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics...

Page 1: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

detecting and analysing emotion in social network sites

Mike ThelwallStatistical Cybermetrics Research GroupUniversity of Wolverhampton, UK

Virtual Knowledge Studio (VKS)

Information Studies

MySpace commentscase study

Page 2: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

research motivation

sentiment is a frequently overlooked key factor in communication and relationshipsneeds to be investigated to understand the role of sentiment in new online environments identify suicide “at risk” discover emotional factors necessary for

sustained online environments modify bots to detect and react appropriately

to emotional communication

Page 3: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

talk structure

part 1: background information about MySpace comment communication

part 2: automatically detecting sentiment in MySpace comments

Page 4: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.
Page 5: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

MySpace comments arepublic or semi-public short messagesexchanged by Friends

but what is their purposeand what do they look like?

Page 6: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

comments

Displayed in public on home page – public personal messages

Page 7: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

purpose 1: gossip (53% of dialogs) – examples of gossip comments

I moved to Houston, Tx.I come home at the beginning of Julywell i just diyed my hair nearly black!!i regret not going to UMSX bc MZU is so much harderi sooo messed up :((for a white guy tim knows a lot of rap songTina talks about you all the time.Nigel said you were feeling bad

Page 8: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

purpose 2: coordination of offline activities (18% of dialogs)

CALL ME WHEN YOU GET A CHANCE hey text me sometime.. [number]i hope to see you toniiite <3I'm gonna be in ABD in Jan. for like a week, we gotta hang outHey I can call you 2day?!!

purpose 3: keeping in contact

Page 9: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

emotion in MySpace

how important is emotion expression in social network communication?who uses emotion and what type of emotion?

Page 10: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

emotion in Friend comments

most comments contain positive emotion (including formal expressions, such as “Love, Sue” or “raj x”)few contain negative emotion

Emotion +ve -ve

1 (none) 34% 80%

2 28% 6%

3 35% 11%

4 3% 2%

5 (strong) 0% 1%

Emotion strength in 819random comments

Page 11: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

emotion in Friend comments

positive emotion mainly used by females and mainly directed at femalesno gender difference in negative emotions

Fromfemale

Frommale

Tofemale

2.4 (+)

1.3 (-)

2.0 (+)

1.3 (-)

Tomale

2.2 (+)

1.3 (-)

1.7 (+)

1.5 (-)

Average emotion strength in 819 random comments

Page 12: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs

To identify and analyseCollective Emotionsin Cyberspace

Sentistrength

Page 13: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

problem 1: non-standard English in MySpace comments

Aspect of non-standard EnglishComm

ents

Typographic slang or abbreviations (e.g., omg, lol, hugz, @)

41%

Slang, including dialect, swearing, and idiomatic slang sayings

51%

Non-standard spelling other than the above 33%

Non-standard punctuation 81%

Pictograms 16%

Interjections (e.g., haha, muahh, huh, but not oh). 13%

Non-standard capitalisation 75%

Other non-standard English grammar 56%

Not standard formal written English (i.e., Any of the above)

97%

Page 14: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

common words in commentsRank Word

1-10 i, you, to, the, and, a, u, me, hey, my

11-20 it, for, in, love, is, that, so, up, your, on

21-30 have, of, are, just, lol, but, we, how, be, ya

31-40 at, was, well, what, get, like, good, im, know, out

41-50 been, this, with, see, hope, all, do, not, if, happy

51-60 miss, going, go, time, i'm, ur, back, some, got, there

61-70 when, can, will, thanks, its, or, by, from, now, whats

71-80 say, day, new, hi, much, one, no, about, haha, call

81-90 come, :), soon, too, need, birthday, 2, am, had, here

91-100 dont, doing, as, think, man, page, great, did, weekend, work

Bold words are not in the top 100 for general British English, and italic words are not in the top 100 for general American English.

Page 15: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

problem 2: swearing

rife in MySpaceconveys positive and negative emotionsignored by existing sentiment analysis methods

Page 16: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

emphatic adverb/adjective OR adverbial booster OR premodifying intensifying negative adjective (36% of swearing)

and we r guna go to town again n make a ryt fuckin nyt of it again lol

see look i'm fucking commenting u back lol and stop fucking tickleing me!! Thanks for the party last night it was

fucking good and you are great hosts. That 50's rock and roll weekender was

fucking mint! yeah so me and sarah broke up and

everythings fucking shit

Page 17: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

personal insult referring to defined entity (28% of swearing)

tehe i am sorry.. i m such a sleep deprived twat alot of the time! lolMaxy is the soundest cunt in the world!!!!3rd? i thought i was your main man number one? Fucker write bak cunt xxx You evil cunt! Haha lucky fuck

Page 18: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

idiomatic set phrase OR figurative extension of literal meaning (23%, mostly male)

think am gonna get him an album or summet fuck nows got another copy of the reaction CD (will had fucked the last one lol) qu'est ce que fuck? what the fuck pubehead whos pete and why is this necicery mate Heh long story.. cant be fucked to explain :D

Page 19: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

SentiStrength objective

1. detect positive and negative emotion in MySpace comments

2. develop workarounds for lack of grammar and spelling

3. harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!)

4. classify each MySpace comment as positive 1-5 AND negative 1-5

5. apply to social issues

Page 20: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

SentiStrength algorithm

spelling correction for repeated letters Helllllo -> Hello (emphasis: llll)

list of +ve and -ve words with strengths (party from LIWC; includes swearing) hate=-4, love =3

extra heuristics emphasis acts to enhance + or – emotion emotion words ignored in questions take strongest +ve & -ve expression in whole

comment booster words (e.g., very, some)

http://sentistrength.wlv.ac.uk/

Page 21: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

sentiment strength estimation example

HEEEEEEEEY BUDDY!!!!!!!!

HEY BUDDY!

HEY BUDDY!

+1 +1

overall – positive: 3, negative 1

1 +1=2

word +ve

hey 1

buddy

2

2 +1=3

translation and extraction of emphasis

Look up words in Sentiment strengthdictionary

Page 22: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

SentiStrength vs. std. classifiers

Algorithm Positive Negative

SentiStrength 60.9% 73.0%

Support Vector Machines 56.2% 73.6%

Simple logistic regression

55.0% 72.8%

J48 classification tree 54.9% 72.6%

Naïve Bayes 54.9% 67.3%

Decision table 54.8% 73.8%

JRip rule-based classifier 54.1% 73.1%

Multilayer Perceptron 49.6% 71.4%

Baseline 41.6% 71.2%

Random 20.0% 20.0%

10-foldcross-validationon 1041human-classifiedcomments

Page 23: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

application - evidence of emotion homophily in MySpace

automatic analysis of sentiment in 2 million comments exchanged between MySpace friends correlation of 0.227 for +ve emotion strength and 0.254 for –vepeople tend to use similar but not identical levels of emotion to their friends in messages

Page 24: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

conclusions

social network sites are a source of sentiment expressed in very informal languagecan identify positive and negative sentiment with reasonable accuracyapplications: identifying social trends Identifying potential emotional

“anomalies”

Page 25: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

bibliographyThelwall, M., Buckley, K., Paltoglou, G., Cai, D. & Kappas, A. (under review). Sentiment strength detection in short informal text. Thelwall, M., Wilkinson, D. & Uppal, S. (2010). Data mining emotion in social network communication: Gender differences in MySpace, Journal of the American Society for Information Science and Technology, 61(1), 190-199. Thelwall, M. (2008). Fk yea I swear: Cursing and gender in a corpus of MySpace pages, Corpora, 3(1), 83-107.Thelwall, M. (2009). Homophily in MySpace, Journal of the American Society for Information Science and Technology. 60(2), 219-231.Thelwall, M. (2009). Social network sites: Users and uses. In: M. Zelkowitz (Ed.), Advances in Computers 76. Amsterdam: Elsevier (pp. 19-73). Thelwall, M. & Wilkinson, D. (2010). Public dialogs in social network sites: What is their purpose?, Journal of the American Society for Information Science and Technology, 61(2), 392-404http://www.cyberemotions.eu/snic.ppt

Page 26: Detecting and analysing emotion in social network sites Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge.

references 2

Gobron, S., Ahn, J., Paltoglou, G., Thelwall, M. & Thalmann, D. (in press). From sentence to emotion: A real-time three-dimensional graphics metaphor of emotions extracted from text. The Visual Computer: International Journal of Computer Graphics.Thelwall, M. (2009). MySpace comments. Online Information Review, 33(1), 58-76.Thelwall, M. (2008). Social networks, gender and friending: An analysis of MySpace member profiles, Journal of the American Society for Information Science and Technology, 59(8), 1321-1330.

http://www.danah.org/researchBibs/sns.html