Research Literature on Social Scores

39
SOCIAL SCORES Supervised by Dr. Dilum Bandara E.A.M.M Edirisinghe 138211N

description

This presentation discusses some literature that I have looked, for my research project on Social Scores

Transcript of Research Literature on Social Scores

Page 1: Research Literature on Social Scores

SOCIAL SCORES

Supervised by Dr. Dilum Bandara

E.A.M.M Edirisinghe138211N

Page 2: Research Literature on Social Scores

Outline

2

• TunkRank – A Twitter Analog to PageRank• TwitterRank – Finding Topic-sensitive

Influential Twitterers• Influence Rank – An Efficient Social

Influence Measurement • Why your Klout score is meaningless• Research Questions Addressed

Page 3: Research Literature on Social Scores

3

Research Questions• How do existing systems calculate social scores ?• Which parameters are representative of a user’s

true social influence ?• What are the desirable properties of a social score ?• How to vary the parameter weights across different

topics and applications ?• How to come up with a performance efficient

algorithm ?• How to calculate social score and update it in real

time ?

Page 4: Research Literature on Social Scores

4

TunkRank A Twitter Analog to PageRank

Page 5: Research Literature on Social Scores

5TunkRank – A Twitter Analog to PageRank

TunkRank• Proposed by Daniel Tunkelang in 2009• Implemented by Jason Adams• Assumptions:– Influence(X) – Expected number of people who will

read a tweet that X tweets– Probability that X will read a tweet posted by Y 1/||

Following(X)|| X - a member of Followers(Y) Following(X) - the set of people that X follows– If X reads a tweet from Y, there’s a constant probability

p that X will retweet it.

Page 6: Research Literature on Social Scores

6

TunkRank

• Hard to game• Will address the inflation that occurs from people

who follow in the hopes of reciprocity• Doesn’t consider how a person allocates his

attention among the people he follow

TunkRank – A Twitter Analog to PageRank

Page 7: Research Literature on Social Scores

7

TwitterRank Finding Topic-sensitive Influential

Users

Page 8: Research Literature on Social Scores

8

TwitterRank

Two main contributions• Report homophily in Twitter• Introduce TwitterRank to measure

topic sensitive influence of twitterers

TwitterRank – Finding Topic-sensitive Influential Users

Page 9: Research Literature on Social Scores

9

Framework for the Proposed Approach

Topic

Distillation

Topic-specificRelationship

NetworkConstruction

Topic-sensitive

User InfluenceRanking

TwitterRank – Finding Topic-sensitive Influential Users

Page 10: Research Literature on Social Scores

10

Dataset

• Consider a set of top-1000 Singapore-based twitterers S, |S|=996.

• Crawled all followers and friends of each s S & ∈stored them in set S’.

• Let S’’= S S’, & S* = {s|s S’’, s is from Singapore}.∪ ∈

|S*| = 6748.

For each s S*, crawled all the tweets published, T. ∈|T|=1,021,039.

TwitterRank – Finding Topic-sensitive Influential Users

Page 11: Research Literature on Social Scores

11

Reciprocity in Following Relationships

• 72.4% of the twitterers follow more than 80% of their followers

• 80.5% of the twitterers have 80% of their friends follow them back

Casual following or homophily?

TwitterRank – Finding Topic-sensitive Influential Users

Page 12: Research Literature on Social Scores

12

Homophily in Twitter

•Question 1: Are twitterers with “following” relationships more similar than those without according to the topics they are interested in?•Question 2:

Are twitterers with reciprocal “following” relationships more similar than those without according to the topics they are interested in?

TwitterRank – Finding Topic-sensitive Influential Users

Page 13: Research Literature on Social Scores

13

Topic Modeling

• Goal:

Automatically identify the topics that twitterers are

interested in based on the tweets they published.

• Latent Dirichlet Allocation (LDA) model is applied

TwitterRank – Finding Topic-sensitive Influential Users

Page 14: Research Literature on Social Scores

14

Topic Modeling Results

DT — D×T matrix

D : No of users

T : No of topics

DTij : No of times a word in user si’s tweets has

been assigned to topic tj.

TwitterRank – Finding Topic-sensitive Influential Users

Page 15: Research Literature on Social Scores

15

Hypothesis Testing• Applied on a set of twitterers who publish more than 10 tweets in total,

| | = 4050.

• Row normalize the DT matrix as DT’ such that ||DT’i ·||1=1 for each

row DT’i .

• Thus each row of matrix DT’ is basically the probability distribution of twitterer si’s interest over the T topics.

• Measure the topical difference between twitterers

• Formalize each question with two sample t-tests and proves the existence of homophily in the Twitter dataset

There are twitterers who are serious in following others.

*uS

TwitterRank – Finding Topic-sensitive Influential Users

*uS

Page 16: Research Literature on Social Scores

16

Topic Specific Twitter Rank• Forms a directed graph D(V,E)– edge between two twitterers if there is “following” relationship

between them– edge is directed from follower to friend.

• A topic-specific random walk model is applied to calculate the

user’s influential score.• The transition matrix for topic t, denoted as Pt . The transition

probability of surfer from follower si to friend sj is:

:

| |( , ) * ( , )

| |i a

jt t

aa s s

Tp i j sim i j

T

' '( , ) 1 | |t it jtsim i j DT DT

TwitterRank – Finding Topic-sensitive Influential Users

Page 17: Research Literature on Social Scores

17

Topic Specific Twitter Rank• Topic-specific teleportation:

• The influence scores of twitters are calculated iteratively:

• Aggregation of topic-specific TwitterRank:

''t tE DT

(1 )t t t tTR P TR E

t tt

TR r TR

TwitterRank – Finding Topic-sensitive Influential Users

Page 18: Research Literature on Social Scores

18

Review

• Homophily does exist• Still some follow not because of the topical

similarity

• Easy to game• Need to discuss an incremental approach to

topic distillation

TwitterRank – Finding Topic-sensitive Influential Users

Page 19: Research Literature on Social Scores

19

InfluenceRank An Efficient Social Influence

Measurement

Page 20: Research Literature on Social Scores

20

InfluenceRank• Define the influence of a user from two perspectives

– Users Relative Influence – Users Network Global Influence

• Define the micro blog network asSN = (G,B,I)G - link network structure,B - set of interactive behaviors between each pair of associated

usersI - set of profile information of each user

• Graph G = (V,E) V - set of nodes represented by user’s index E - set of directed edges

InfluenceRank – An Efficient Social Influence Measurement

Page 21: Research Literature on Social Scores

21

InfluenceRank• Define the set of behaviors B, B = (R,C,M) (R - Retweets), (C - comment), (M - mention)• Define the profile information set I, I = (P,T,K), P - set of number of postings T - set of users’ interest tags K - set of users’ content keywords.

InfluenceRank – An Efficient Social Influence Measurement

Page 22: Research Literature on Social Scores

22

Metrics Explored• No of followers• Quality of followers• Quality of tweets

• Similarity of interests– Similarity of user interest tags

Similarity of user interest tags function TS(vi,vj)

Calculates the similarity of interest tags between node vi & vj

– Similarity of user content keywordsSimilarity of user content keywords function KS(vi,vj)

Calculates the similarity of interests between node vi & vj based on their content keyword set

Similarity of two users’ interestsSim(vi,vj) = TS(vi,vj) + KS(vi,vj)

InfluenceRank – An Efficient Social Influence Measurement

Page 23: Research Literature on Social Scores

23

User Relative Influence Rank• Consider three aspects

– The quality of postings– Ratio of retweeting behavior– Similarity of interests

• Users Relative Influence Function RI(vi,vj) RI(vi,vj) = Q(vi) + R(vi,vj) + Sim(vi,vj)

R(vi,vj) represents the ratio of retweet of user vj to vi.

InfluenceRank – An Efficient Social Influence Measurement

Page 24: Research Literature on Social Scores

24

Users Network Global Influence Rank• Combines structural features and users’ behavior

characteristics• User Network Global Influence Rank Function,

Influence(v)

Damping factor (λ) =0.85

InfluenceRank – An Efficient Social Influence Measurement

Page 25: Research Literature on Social Scores

25

Influence Rank Algorithm

Time complexity - O(e)

InfluenceRank – An Efficient Social Influence Measurement

Page 26: Research Literature on Social Scores

26

Influence Rank Algorithm• Evaluated with a dataset of Tencent Weibo, contrast

with the TunkRank algorithm • Emphasis on users’ interactive behaviors• Weight of each metric considered to measure the

user’s relative influence is taken as equal• Instead of similarity of topics considers the similarity

of keywords• Ignore the impact of negative comments and

conversations• Model is based on a snapshot of current relationships

and interactions

InfluenceRank – An Efficient Social Influence Measurement

Page 27: Research Literature on Social Scores

27

Why your Klout score is meaningless

Page 28: Research Literature on Social Scores

28

Why your Klout score is meaningless ?

Klout is far more similar to a derived measurement

inconsistent and not trustworthy individually

Why your Klout score is meaningless

Page 29: Research Literature on Social Scores

29

What should Klout score satisfy ?

• Ordering by Klout should make sense in the real world • The score should not be easy to

game• The score should be monotonic

Why your Klout score is meaningless

Page 30: Research Literature on Social Scores

30

Klout score comparisons

• A set of individuals with Klout in the 40-49 range• A set of individuals with Klout in the

55-64 range• A set of individuals with Klout in the

70-79 range• A set of individuals with Klout >= 80

Why your Klout score is meaningless

Page 31: Research Literature on Social Scores

31

Group3 (Klout 70-79)• U1-Tim Ferriss – Author of the 4 Hour Workweek

and 4 Hour Body• U2-Jack Dorsey – Executive Chairman of Twitter

and CEO of Square• U3-Matt Cutts– Head of web spam team at Google• U4-MG Siegler – Writer for Techcrunch• U5-Klout – Influence score service• U6-David Pogue – Tech guy from the NYT• U7-Jeffrey Zeldman – designer, writer, and

publisher

Why your Klout score is meaningless

Page 32: Research Literature on Social Scores

32

Group3 (Klout 70-79)U1 U2 U3 U4 U5 U6 U7

As per 29th May 2011

Why your Klout score is meaningless

Page 33: Research Literature on Social Scores

33

Klout violates the Desirable Properties• Connecting an additional account will always increase the Klout

score.• The degree to which followers are influential seems to be

irrelevant or matter very little• The differential between number of people someone follow

seems to be irrelevant or matter very little.• In terms of value to the Klout score: follow < Retweets < unique Retwitters < unique mention

can be inconsistent• In terms of value to the Klout score: like < comment

can be inconsistent

Why your Klout score is meaningless

Page 34: Research Literature on Social Scores

34

Research Questions Addressed

Page 35: Research Literature on Social Scores

How do existing systems calculate social scores

35

Research Questions Addressed

Research Questions Addressed

TunkRank Probability that the follower will read a tweet posted by the followeeProbability a tweet read will be retweetedNumber of followers and their influence

TwitterRank Measures the topic-sensitive influence of twitterersConsiders the similarity between friends on topicsNumber of tweets published by all friends

InflueceRank Defines a user’s relative and global influenceNumber of followersQuality of followersQuality of tweetsSimilarity of interests

Influence Measure with a Network Amplification Score

Accounts the content and conversation generated by considering indegree and outdegree of the social network for multiple levels

Page 36: Research Literature on Social Scores

Which parameters are representative of a user’s true social influence• It’s not just the number of followers or the number

of friends• Link structure• Following relationship• Similarity• Interactions• Topics and communities• Quality of followers, tweets etc.

36

Research Questions Addressed

Research Questions Addressed

Page 37: Research Literature on Social Scores

What are the desirable properties of a social score• Ordering by the score should make sense in

the real world• The score should not be easy to game• The score should be monotonic• Equation should be simple & easy to

understand/interpret• Should be meaningful

37

Research Questions Addressed

Research Questions Addressed

Page 38: Research Literature on Social Scores

• Daniel Tunkelang. (2009, Jan 13). A Twitter Analog to PageRank [Online]. Available: http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/

• Neal Richter. (2009, Feb 18). TunkRank Scoring Improvement [Online]. Available: http://aicoder.blogspot.com/2009/02/tunkrank-scoring-improvement.html

• Jianshu Weng et al., “TwitterRank: Finding Topic-sensitive Influencial Twitterers,” in WSDM Conf., New York, USA, 2010, pp. 261-270

• Wenlong Chen et al., “InfluenceRank: An Efficient Social Influence Measurement for Millions of Users in Microblog,” in 2nd Int. Conf. on CGC, Xiangtan, 2012, pp. 563-570

• Alex Braunstein. (2011, June 01). Why your Klout score is meaningless [Online]. Available: http://alexbraunstein.com/2011/06/01/why-your-klout-score-is-meaningless/

• Sean Golliher. (2011, June 27). How I Reverse Engineered Klout Score to an R2 = 0.94. [Online]. Available:http://www.seangolliher.com/2011/uncategorized/how-i-reversed-engineered-klout-score-to-an-r2-094/

38

References

Page 39: Research Literature on Social Scores

Thank You

39