Observational studies in social media

51
Observational Studies Class Data Mining Technology for Business and Society Program M. Sc. Data Science University Sapienza University of Rome Semester Spring 2016 Lecturer Carlos Castillo http://chato.cl/ Sources: Multiple papers, see beginning of each section.

Transcript of Observational studies in social media

Page 1: Observational studies in social media

Observational Studies

Class Data Mining Technology for Business and SocietyProgram M. Sc. Data ScienceUniversity Sapienza University of RomeSemester Spring 2016Lecturer Carlos Castillo http://chato.cl/

Sources:● Multiple papers, see beginning of each section.

Page 2: Observational studies in social media

Matching is a popular technique

Randomized controlled experiment

1.Response of subjects assigned to treatment compared to response of subjects assigned to control

2.Assignment of subjects to groups is done using a randomization device

3.Treatment is under the control of a researcher

Matching observational study

1.Response of subjects assigned to treatment compared to response of subjects assigned to control

2.Assignment of subjects to control is done matching characteristics and size of treatment group

3.Treatment is not under the control of a researcher

Page 3: Observational studies in social media

Matching design: hurricanes and online friendships

Phan, Tuan Q., and Edoardo M. Airoldi. "A natural experiment of social network formation and dynamics." Proceedings of the National Academy of Sciences 112.21 (2015): 6595-6600.

Page 4: Observational studies in social media

Example: US universitiesand Hurricane Ike in 2008

Phan, Tuan Q., and Edoardo M. Airoldi. "A natural experiment of social network formation and dynamics." Proceedings of the National Academy of Sciences 112.21 (2015): 6595-6600.

Treatment n=5Control n=10Study group n=130

Page 5: Observational studies in social media

Selection of control group

● Facebook posts from 1.5M students in 130 universities

● Matched 5 affected with 10 unaffected:– Similar: size, college

ranking according to USNews, whether these colleges are public or private institutions, tuition fees, and other regional factors

Page 6: Observational studies in social media

Results(red=treatment, blue=control)

Both undergo densification

Treatment has larger clustering coefficient(more triangles)

Page 7: Observational studies in social media

Matching design: exercise and stressas reflected on Twitter

Dos Reis, Virgile Landeiro, and Aron Culotta: Using matched samples to estimate the effects of exercise on mental health from Twitter. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.

Page 8: Observational studies in social media

Exercise Mood

Page 9: Observational studies in social media

Design

1) Detect exercise at time t1

– Post a message containing hashtag #runkeeper, #nikeplus, #runtastic, #endomondo, #mapmyrun, #strava, #cyclemeter, #fitstats, #mapmyfitness, or #runmeter.

2) Measure mood at time t2 > t1

– Automatic classifier of mood, three important classes: hostility (or anger), depression (or dejection), anxiety (or tension)

Page 10: Observational studies in social media

Mood classifier

● Hostility (or anger)● “shut your freaking yaphole”

● Depression (or dejection)● “such a horrible day”

● Anxiety (or tension)● “nervous for Monday”

Page 11: Observational studies in social media

Control = Random users(same country and language)

Hostility Dejection Anxiety

-25

-20

-15

-10

-5

0-21.1 -5.4 -7.9

Per

cent

cha

nge

afte

r e x

erci

sing

Page 12: Observational studies in social media

Problem: missing variables

Exercise Mood

Demographics

Page 13: Observational studies in social media

Matching method ...

● For each user in treatment, find another user that:– Is a reciprocal friend of the user

– In same city/state

– With same gender

– Closest number of followers, followees, tweets

Page 14: Observational studies in social media

Control = Matched users (blue)

Hostility Dejection Anxiety

-25

-20

-15

-10

-5

0-21.1 -5.4 -7.9

0.9-3.9 -2.7

Per

cent

cha

nge

afte

r e x

erci

sing

Page 15: Observational studies in social media

Can you guess a possible explanation?

%Female

%from CA

0 10 20 30 40 50 60

random controlmatched controltreatment

#Followers

#Friends

0 50 100 150 200 250 300 350 400 450 500

random controlmatched controltreatment

Page 16: Observational studies in social media

Matching design and difference-in-differences: question answering sites

Hüseyin Oktay, Brian J. Taylor, and David D. Jensen. 2010. Causal discovery in social media using quasi-experimental designs. SOMA 2010.

Page 17: Observational studies in social media

Stack Overflow

Page 18: Observational studies in social media

Research question

● What happens after an answer is accepted?

● Does this inhibit people from answering?

Page 19: Observational studies in social media

The lifetime of a question

● Most answers are received shortly after a question is posed

● Over time, fewer answers are received● At some point, an answer might be accepted by

the asker

Page 20: Observational studies in social media

Measurement

● Rate after

● Rate before

● Answer rate change

Page 21: Observational studies in social media

Results

● Results indicate that the average answer rate change is negative, i.e. there are less answers after an answer is selected

What is suspicious about this?

Page 22: Observational studies in social media

Matching design

Treatment Control (matched)

The matched question: (1) has no accepted answer by t+Δt, (2) has similar Nt/t, and (3) has similar Nt+Δt/Δt

Page 23: Observational studies in social media

Difference-in-differences

● Difference-in-differences:

Page 24: Observational studies in social media

Results

● The matching design shows that the answering rate change is more positive for treatment questions (those having a selected answer)

● Having a selected answer slows down the reduction in answering rate = more answers!

Page 25: Observational studies in social media

Propensity score matching: actions and outcomes

Alexandra Olteanu, Onur Varol, and Emre Kıcıman, Towards an Open-Domain Framework for Distilling the Outcomes of Personal Experiences from Social Media Timelines, in International Conference on Web and Social Media (ICWSM), AAAI - Association for the Advancement of Artificial Intelligence, 17 May 2016. [link]

All the slides from this section from author's talks:

Page 26: Observational studies in social media

Have a question? Ask the Internet!should i go to law school

should i take a multivitamin

should i text her or wait for her to text me

should i join the military

should i leave my husband

should i get married

should i pop a burn blister

should i see a doctor

should i consolidate my student loans

should i do cardio before or after weights

should i get a tattoo

Page 27: Observational studies in social media

Idea

● Open-domain system to extract ...

Situation → Action → Outcomes

● … from social media● Assume there will be many mistakes● Attempt the best possible design

Page 28: Observational studies in social media

Example

T1: “I got a kitten! We named

her Versace :-)”

Page 29: Observational studies in social media

Example

T1: “I got a kitten! We named

her Versace :-)”

T2: “No sleep because the damn kitten is nuts!”

Page 30: Observational studies in social media

Basic operations

(1) Extract timelines

(2) Match events

(3) Precedents and subsequents

Page 31: Observational studies in social media

Many sub-problems

● Identification of experiential messages● Timestamping event occurrences● Recognition and canonicalization events● Identification of precedent and subsequent

events● Identification of positive and negative valence

of events

Page 32: Observational studies in social media

Experiential messages classifierPersonal Experiences Other (news, 3rd person,

etc.)

Just completed a 15.72km run with @RunKeeper. Check it out! <URL> #RunKeeper

New campaign to protect children from second hand smoke launched <URL>

Just to set the mood I brought some Marvin Gaye and Chardonnay

Whoa. The kid from Cincinnati just suffered a horrible injury. Not good.

Lacrosse is so much fun why didn’t I start earlier lol

@Bob I hear you.

Oh yeah guys we got a new puppy

@Charlie did you enjoy your night at the club?

Naïve-Bayes classifier • Features = collocated

tokens• 10k labeled tweets.• Fleiss’ kappa = 0.325

26% of tweets mention personal experiences8% mention goals/desires66% are news/3rd person or other tweets

Page 33: Observational studies in social media

Event identification

I got a new kitten and he has blue eyes and stripes and I need a good name but

nothing that’s normal

I got a new kitten

he has blue eyes

but nothing that’s normal

stripes

I need a good name

== got a cat, got a new cat, …

Kıcıman, Emre, and Matthew Richardson. "Towards decision support and goal achievement: Identifying action-outcome relationships from social media." KDD 2015. [link]

Page 34: Observational studies in social media

Alignment

Page 35: Observational studies in social media

Alignment and matching

Page 36: Observational studies in social media

Compare withboth neighboring quadrants

Page 37: Observational studies in social media

Example subsequentsEvent Example PosNeg

Pros cat named We just got a cat and named it Versace

0.70

I’ve got a cat I’ve got a kitten asleep on my lap, and my heart has softened.

0.67

Love my new kitten

I love my new kitten 0.88

Cons Ran upstairs But I ran upstairs and fell and now my head hurts

0.20

Damn kitten … no sleep because the damn kitten kept going nuts…

0.22

Cat is literally My cat is literally the devil 0.31

Page 38: Observational studies in social media

Example precedents● Event: “personal record” in marathon

Days Before Marathon

Page 39: Observational studies in social media

Improving matching

● Matching ideally should take many elements into account

● Can we take all the elements we know?– Yes!

● Propensity matching matches by P(T=1)

Page 40: Observational studies in social media

Propensity matching stratification

Page 41: Observational studies in social media

Propensity matching stratification

Features of a user are all of their past events

PS Estimator trained w/average perceptron learning algorithm; extracted timelines are training data.

Decile stratification

Page 42: Observational studies in social media

Propensity score matching

● You got a kitten● According to what's known about your past,

your probability of getting a kitten was x● You will be matched with someone whose

probability of getting a kitten was also x– But who did not get a kitten

● Every strata has a different unbalance– Which is predictable

Page 43: Observational studies in social media

Matching design

● 39 situations in 9 groups● Outcome is binary

variable● Average effect

P(outcome|T) - P(outcome|C)

Page 44: Observational studies in social media

Example:having high triglycerides level

Outcome Count Absolute Increase

Z-Score

Your_risk 46 24.8% 18.12

Statin 48 23.1% 17.69

Lower 120 35.9% 17.18

Cardiovascular 54 23.0% 16.72

Healthy_diet 55 19.3% 16.54

Fatty_acid 29 18.3% 16.37

Help_prevent 73 26.9% 16.01

Risk_factor 33 18.3% 15.55

Fish_oil 48 24.4% 15.42

inflammation 78 25.1% 15.30

Page 45: Observational studies in social media

Example:having belly fat

Outcome Count Absolute Increase Z-Score

Burn 156 62.2% 8.96

Ab_workout 13 8.5% 5.82

Workout_lose 13 8.5% 5.82

Help_burn 8 11.1% 5.82

add_video 26 14.0% 5.75

url_playlist 26 14.0% 5.75

Fitness 39 18.6% 5.51

Ab 43 19.1% 5.51

Playlist_mention 30 15.3% 5.39

Biceps 7 4.7% 4.74

Page 46: Observational studies in social media
Page 47: Observational studies in social media

Evaluation

Labeling by non-experts (Mechanical Turk workers)Usual measures: precision and recall

Page 48: Observational studies in social media

Precision @ Rank

Page 49: Observational studies in social media

Summary

● No matching– Requires randomization into treatment and control

groups

● Matching– Ideally is done on all known variables

● Propensity score matching– Powerful tool to combine known variables

● Be very skeptical about your results!

Page 50: Observational studies in social media

EventInstall net to keep cat inside the house

Page 51: Observational studies in social media

OutcomeLearning that cats do whatever they want