A Study of social influence in diffusion of innovation over Facebook Shaomei Wu [email protected]...
-
Upload
samantha-guttridge -
Category
Documents
-
view
214 -
download
1
Transcript of A Study of social influence in diffusion of innovation over Facebook Shaomei Wu [email protected]...
A Study of social influence in diffusion of innovation over Facebook
Shaomei Wu
Information Science
Cornell University
Information Science Breakfast, Dec 5, 2008
Diffusion of Innovation
“ Diffusion is the process in which an innovation is communicated through certain channels over time among the members of a social system. ”
–––– Everett M. Rogers *
“innovation”: Friendship Quiz – a Facebook application “Communicated”: Invitations among Facebook friends “time”: September 25, 2008 – Now “social system”: Facebook
* Rogers, Everett M. (2003). Diffusion of Innovations, 5th ed.. New York, NY: Free Press, pp 5-6
Basic Diffusion Models
Threshold Model Cascade Model⇔
Statistically Equivalent *
*David Kempe, Jon Kleinberg, Eva Tardos. Maximizing the Spread of Influence through a Social Network. KDD, 2003
Cascade Model
Each recommendation will succeed with certain probability.
a
b
d
e
f
g
c
h
i
j
k
l
pab
pagpac
pad
pae
paf
pdi
pdj
pgk
pgl
pab
non-adopter
adopter
social link
recommendationQuestion: how to estimate puv ?
Question: how to estimate puv?
Current practice Constant [1]
Based on ONLY network structure (e.g., in/out-degree) [2]
[1] Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst, Cascading Behavior in Large Blog Graphs. SDM 2007.
[2] Jure Leskovec, Lada Adamic, Bernardo Huberman. The Dynamics of Viral Marketing. ACM Conference on Electronic Commerce (EC) 2006.
Do individuals and the social relationship among them matter?
Theories from Empirical Diffusion Research:
Opinion leaders: who own “greater exposure to mass media than their followers”, “are more cosmopolite”, “have greater social participation” , “have higher socioeconomic status”, and “are more innovative” [Rogers 2003, pp 316-318].
The importance of heterophily between participants on certain attributes (i.e., education and socioeconomic status) at determining the efficiency of diffusion, despite the fact that “more effective communication occurs when two or more individuals are homophilous” [Rogers, 2003, pp19]
This project is to…
Model puv’s for cascade model Identify the most influential factors at determining puv
Predict the success of contagion
Exploit Facebook data A real-world, ongoing diffusion instance; Rich and (most of the time) trustable profile information of
individuals and their social connections/activities; Precisely timestamped diffusion process, a complete log of
events;
Status
Launched: Sep 25, 2008. Currently used data is until: Nov 25, 2008.
216 adopters, 375 individuals, 737 edges between 266 pairs of people, 90 successful infection 178 failed infection
Network Evolution (in the first month after release)
political view distribution
0
2
4
6
8
10
12
cons
erva
tive
mod
erat
e
liber
al
Liber
taria
n
Democ
ratic
Par
ty
Repub
lican
Par
ty
Apath
etic
othe
r
# of people
adopters
non-adopters
Religious View Distribution
0
2
4
6
8
10
12
14
16
Christian Muslim Other
Religion
Co
un
t adopters
non-adopters
Gender distribution
82
47
56
26
0
10
20
30
40
50
60
70
80
90
adopters non-adopters
# o
f p
eo
ple
female
male
Age distribution
0
5
10
15
20
25
30
age
pe
op
le c
ou
nt
Non-adopter
Adopter
Predict the success of invitation with SVM
A Binary classifier: each invitation is either successful or failed.
Features Individual features Pair features (homophily/heterophily)
Individual Features
# of events attended/invited# of photo tagged
# of wall posts# of networks
# of groups participated# of notesReligion
Political ViewGender
AgeCulture BackgroundRelationship Status
Work InfoEducation Info
Social Activeness
Socioeconomics
Education
Innovativeness
Pair-wise Features
Age differenceSame gender?
Same political view?Same religion?
Same culture background?# of same networks
# of photos both tagged# of groups both participated
# of events both attendedSame education level?
Same high school?Same college?
Same workplace?Same current city?
Biological traits
Socioeconomics
Proximity
Belief
time sender receiver classsender
featuresreceiver features
pair features
2008-09-25 18:25:41
589483260 3621185 1 1:22 2:1 3:0 4:0 5:0 6:1 … …
35:1 47:0 48:0 49:0 50:0 51:0
… …
68:0 69:0 70:0 74:1 76:1
… …
2008-09-25 18:25:49
3621185 571023231 -1 … … …
… … … … … … …
… … … … … … …
2008-11-24 02:40:34
768059413 81405257 -1 … … …
Training Data
Each invitation is a training example - machine learning.
* all numerical features are normalized across examples.
AdaBoost (with DecisionDump) A popular way to do feature selection.
Selected Features sender wall post count sender group count sender network count receiver age receiver group count sender & receiver common group count
Performance (10-fold cross validation) Accuracy: 83.6%
Class precision Recall
-1 83.5% 93.8%
1 83.8% 63.3%
SVM performance SVM-light (10-fold cross-validation)
fold accuracy precision recall
1 80.77 100 58.33
2 80.77 100 44.44
3 88.46 100 62.5
4 76.92 50 33.33
5 73.08 100 30
6 84.62 100 50
7 69.23 50 50
8 76.92 100 53.85
9 88.46 100 66.67
10 88.24 80 57.14
average 80.747 88 50.626
Weights from SVM
feature weight distribution
sender_isOther
sender_isInARelationshipreceiver_isAtheist/Agnostic
receiver_isWorking
receiver_eventCount
receiver_groupCount
receiver_photoTagged
sameWorkPlace
receiver_age
sameReligionsameCollege
receiver_isMiddleEasternreceiver_isMuslim
receiver_isChristian
sender_isCollegesender_isWorking
sender_isChristian
sender_isModerate
sender_age receiver_isRepublicsender_wallCount
sender_isMarried
receiver_noteCount
sender_networkCount
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
feature
wei
gh
t
Result
SVM-light performance 209 records into 5 folds, 4 for training, 1 for testing. Performance on the testing set:
Accuracy: 71.43% (30 correct, 12 incorrect, 42 total) Precision/recall: 55.56%/38.46%
Feature weights distribution
Feature Weights
1 2 3
4
5 6
8
9
10
11
12 171819202122 242526
27
2830313233
34
35
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 5 10 15 20 25 30 35 40
Top weighted features:
8, sender_events_invited,4, sender_friend_count,11, sender_gender35, receiver_is_It's Complicated5, sender_wall_post_count,9, sender_note_count27. sender_is_In a Relationship
So, the story can be: when a sender who has been invited to greater number of events in Facebook, has more friends, wrote more Facebook notes (blog entries), is female, has less wall posts, in a relationship, tried to infect a person whose relationship status is “it’s complicated”, it’s more like the infection will happen compared to other cases.
SVM with features selected by AdaBoost
fold accuracy precision recall
1 80.77 100 58.33
2 80.77 83.33 55.56
3 88.46 100 62.5
4 73.08 0 0
5 76.92 100 40
6 84.62 83.33 62.5
7 76.92 66.67 50
8 80.77 100 61.54
9 96.15 100 88.89
10 91.18 83.33 71.43
average 82.96 81.67 55.075
Background
Diffusion of Innovation
Question: How does it work in large online social networks? What are the key factors at determining the
success of infection? Can we predict the propagation path?
Hypothesis Social influence depends on 5 dimensions of similarities:
geographical distance current location(country/state/city), current school, current major, year of class, current workplace, current courses enrolled;
background similarity sex, sexual preference, dating interest, relationship interest, relationship status, birthday, political view, religious view, hometown address, previous school, previous workplace;
social similarity number of mutual networks they belong to, number of mutual friends; interest similarity
activities, favorite books, favorite music, favorite movies, favorite TV shows, favorite quotas;
social status distance difference of numbers of friends, difference of wallpost counts, difference of counts
of message sent and received, difference of counts of notes.
Project Description
Objectives Identify the key factors for social influence; Predict occurrence of adoption based on the key
factors. Friendship Quiz
A Facebook application we developed; Enable users to make quizzes and send to their
friends (take a peek!); We track the spread of application.
Highlights
A real-world diffusion of innovation; Rich and (most of the time) trustful profile
information of individuals and their social connections/activities;
Precisely timestamped diffusion process, a complete log of events;
Ongoing diffusion process
Backup: Threshold Model