TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER
description
Transcript of TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER
![Page 1: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/1.jpg)
TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER
Manikandan VijayakumarArizona State UniversitySchool of Computing, Informatics, and Decision Systems EngineeringMaster’s Thesis Defense – July 7th, 2014
![Page 2: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/2.jpg)
Orphaned Tweets
2Source: Twitter
Orphaned Tweets
![Page 3: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/3.jpg)
Overview
3
Overview
![Page 4: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/4.jpg)
4
Twitter• Twitter is a micro-blogging platform where users can be • Social • Informational or •Both
• Twitter is, in essence, also a Web search engine Real-Time News media Medium to connect with friends
Image Source: Google
![Page 5: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/5.jpg)
5
Why people
use Twitter?
According to Research charts, people use Twitter for•Breaking news• Content Discovery• Information Sharing•News Reporting•Daily Chatter• Conversations
Source: Deutsche Bank Markets
Why people use Twitter?
![Page 6: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/6.jpg)
6
According to Cowen & Co Predictions & Report:
Twitter had 241 million monthly active users at
the end of 2013 Twitter will reach only 270 million monthly active users by the end of 2014
Twitter will be overtaken by Instagram with 288 million monthly active users
Users are not happy in Twitter
But..
But..
![Page 7: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/7.jpg)
7
Twitter Noise
![Page 8: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/8.jpg)
8
Noise in
Missing hashtags
![Page 9: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/9.jpg)
9
Noise in
User may use incorrect hashtags
![Page 10: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/10.jpg)
10
Noise in
User may use many hashtags
![Page 11: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/11.jpg)
11
Possible Solutions
Importance of using hashtag Hashtags provide context or metadata for arcane tweets Hashtags are used to organize the information in the tweets for retrieval
Helps to find latest trends Helps to get more audience
Missing Hashtag problem - Hashtags are supposed to help
![Page 12: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/12.jpg)
12
Importance of Context in Tweet
![Page 13: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/13.jpg)
13
Orphaned Tweets Non-Orphaned Tweets
![Page 14: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/14.jpg)
14
Problem Solved? Not all users use hashtags with their tweets.
Without Hashtag
87%
With Hashtag13%
EVA et. al. - 300Million tweets -2013
Without HashtagWith Hashtag Without Hashtag
76%
With Hashtag24%
TweetSense Dataset- 8Million tweets -2014
Without Hashtag With Hashtag
But, Problem Still Exist.
![Page 15: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/15.jpg)
15
Existing systems addresses this problem by recommending hashtags based on:
Collaborative filtering- [Kywe et.al. SocInfo,Springer’2012] Optimization-based graph method -[Feng et.al,KDD’2012] Neighborhood- [Meshary et.al.CNS’2013, April] Temporality– [Chen et.al. VLDB’2013, August] Crowd wisdom [Fang et.al. WWW’2013, May] Topic Models – [Godin et.al. WWW’2013,May] On the impact of text similarity functions on hashtag recommendations in
microblogging environments”, Eva Zangerle, Wolfgang Gassler, Günther Specht: Social Network Analysis and Mining; Springer, December 2013, Volume 3, Issue 4, pp 889-898
Existing Methods
![Page 16: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/16.jpg)
16
Objective How can we solve the problem of finding missing hashtags for orphaned tweets by providing more accurate suggestions for Twitter users?
Users tweet history Social graph Influential friends Temporal Information
Objective
![Page 17: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/17.jpg)
17
Impact
Aggregate Tweets from users who doesn’t use hashtags for opinion mining
Identify Context Named entity problems Sentiment evaluation on topics Reduce noise in Twitter Increase active online user and social engagement
![Page 18: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/18.jpg)
18
TweetSense
(Chapter 4) Ranking Methods
(Chapter 8) Conclusions
(Chapter 3) Modeling the Problem
(Chapter 7) Evaluation
(Chapter 5) Binary Classification
(Chapter 6) Experimental Setup
Outline
![Page 19: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/19.jpg)
Modeling the Problem
19
Modeling the Problem
![Page 20: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/20.jpg)
20
Problem Statement Hashtag Rectification Problem
What is the probability P(h/T,V) of a hashtag h given tweet T of user V?
Orphan Tweet VU
System
Recommends Hashtags
Problem Statement
![Page 21: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/21.jpg)
21
TweetSense
(Chapter 4) Ranking Methods
(Chapter 8) Conclusions
(Chapter 3) Modeling the Problem
(Chapter 7) Evaluation
(Chapter 5) Binary Classification
(Chapter 6) Experimental Setup
Outline
![Page 22: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/22.jpg)
22
TweetSense
![Page 23: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/23.jpg)
23
Architecture
Twitter Dataset
Retrieve User’s Candidate Hashtags from their Timeline
Username & Query tweet
Top K hashtags
#hashtag 1#hashtag 2
.
.#hashtag K
Ranking Model
User
Source: http://en.wikipedia.org/wiki/File:MLR-search-engine-example.png
Indexer
Crawler
Learning Algorithm
Training Data
Architecture
![Page 24: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/24.jpg)
24
Hypothesis When a user uses a hashtag,
she might reuse a hashtag which she created before – present in her user timeline
she may also reuse hashtags which she sees from her home timeline (created by the friends she follows) more likely to reuse the tweets from her most
influential friends hashtags which are temporally close enough
A Generative Model for Tweet Hashtags
![Page 25: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/25.jpg)
25
To build a statistical model, we need to model P(<tweet-hashtag>| <tweet-social features> <tweet-content features>)
Rather than build a generative model, I go with a discriminative model
Discriminative model avoids characterizing the correlations between the tweet features
Freedom to develop a rich class of social features. I learn the discriminative model using logistic regression
Build Discriminative model over Generative Model
![Page 26: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/26.jpg)
26
Candidate Tweet Set
Retrieving Candidate Tweet Set
Global Twitter Data
User’s Timeline
U
![Page 27: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/27.jpg)
27
Two inputs to my system: Orphaned tweet and User who posted it.
Tweet content related features
Tweet text
Temporal Information
Popularity
Feature Selection – Tweet Content Related
![Page 28: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/28.jpg)
28
User related features@mentionsFavoritesCo-occurrence of hashtagsMutual FriendsMutual FollowersFollower-Followee Relation
• Features are selected based on my generative model that users reuse hashtags from her timeline, from the most influential user and that are temporally close enough
Feature Selection – User Related
Friends
![Page 29: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/29.jpg)
29
Architecture
Twitter Dataset
Retrieve User’s Candidate Hashtags from their Timeline
Username & Query tweet
Top K hashtags
#hashtag 1#hashtag 2
.
.#hashtag K
Ranking Model
User
Source: http://en.wikipedia.org/wiki/File:MLR-search-engine-example.png
Indexer
Crawler
Learning Algorithm
Training Data
Architecture
![Page 30: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/30.jpg)
30
TweetSense
(Chapter 4) Ranking Methods
(Chapter 8) Conclusions
(Chapter 3) Modeling the Problem
(Chapter 7) Results
(Chapter 5) Binary Classification
(Chapter 6) Experimental Setup
Outline
![Page 31: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/31.jpg)
Ranking Methods
31
Ranking Methods
![Page 32: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/32.jpg)
32
List of Feature Scores
Similarity ScoreRecency ScoreSocial Trend ScoreAttention score Favorite scoreMutual Friend Score Mutual Follower ScoreCommon Hashtags ScoreReciprocal Score
Tweet textTemporal Information Popularity@mentionsFavoritesMutual FriendsMutual FollowersCo-occurrence of hashtagsFollower-Followee Relation
List of Feature Scores
![Page 33: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/33.jpg)
33
Cosine Similarity is the most appropriate similarity measure over others (Zangerle et.al.)
Cosine Similarity between Query tweet Qi and candidate tweet Tj
Similarity Score
![Page 34: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/34.jpg)
34
Exponential decay function to compute the recency score of a hashtag:
k = 3, which is set for a window of 75 hoursqt = Input query tweetCt = Candidate tweet
Recency Score
![Page 35: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/35.jpg)
35
Social Trend Score
Popularity of hashtags h within the candidate hashtag set H Social Trend score is computed based on the "One person, One vote" approach.
Total counts of frequently used hashtag in Hj is computed.
Max normalization
Social Trend Score
![Page 36: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/36.jpg)
36
Attentionscore
&Favorites
score
Attention score and Favorites Score captures the social signals between the users
Ranks the user based on recent conversation and favorite activity
Determine which users are more likely to share topic of common interests
Attention score & Favorites score
![Page 37: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/37.jpg)
37
Attentionscore
&Favorites
scoreEquation
Attention score & Favorites score Equation
![Page 38: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/38.jpg)
38
Gives similarity between users Mutual friends - > people who are friends with both you and the person whose Timeline you’re viewing
Mutual Followers -> people who follow both you and the person whose Timeline you’re viewing
Score is computed using well-known Jaccard Coefficient
Mutual Friend Score & Mutual Followers Score
![Page 39: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/39.jpg)
39
Ranks the users based on the co-occurrence of hashtags in their timelines.
I use the same Jaccard Coefficient
Common Hashtags Score
![Page 40: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/40.jpg)
40
Twitter is asymmetric This score differentiates friends from just topics of interest like news channel, celebrities, etc.,
Reciprocal Score
![Page 41: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/41.jpg)
41
How to combine
the scores?
Combine all the feature scores to one final score to recommend hashtags
Model this as a classification problem to learn weights While each hashtags can be thought of as its own class Modeling the problem as a multi-class classification problem has certain challenges as my class labels are in thousands
So, I model this as binary classification problem
How to combine the scores?
![Page 42: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/42.jpg)
42
Architecture
Twitter Dataset
Retrieve User’s Candidate Hashtags from their Timeline
Username & Query tweet
Top K hashtags
#hashtag 1#hashtag 2
.
.#hashtag K
Ranking Model
User
Source: http://en.wikipedia.org/wiki/File:MLR-search-engine-example.png
Indexer
Crawler
Learning Algorithm
Training Data
Architecture
![Page 43: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/43.jpg)
43
TweetSense
(Chapter 4) Ranking Methods
(Chapter 8) Conclusions
(Chapter 3) Modeling the Problem
(Chapter 7) Evaluation
(Chapter 5) Binary Classification
(Chapter 6) Experimental Setup
Outline
![Page 44: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/44.jpg)
Binary Classification
44
Binary Classification
![Page 45: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/45.jpg)
45
Problem Setup Training Dataset: Tweet and Hashtag pair < Ti ,Hj >
Tweets with known hashtags
Test Dataset: Tweet without hashtag < Ti ,?> Existing hashtags removed from tweets to provide ground truth.
Problem Setup
![Page 46: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/46.jpg)
Training Dataset
The training dataset is a feature matrix containing the features scores of all < CTi ,CHj > pair belonging to each < Ti ,Hj > pair.
The class label is 1, if CHj = Hj , 0 otherwise. Multiple hashtag occurrence are handled as single instance
<CT1 - CH1,CH2,CH3 > = <CT1,CH1> ,<CT1,CH2>, <CT1,CH3> <Tweet(T1), Hashtag(H1) Pair>
<Candidate Tweet, Candidate Hashtag>CT1,CH1CT2,CH2
.
.CTi,CHj
SimilarityScore
RecencyScore
SocialTrendScore
AttentionScore
FavoriteScore
MutualFriendScore
MutualFollowersScore
CommonHashtag Score
Reciprocal Rank
ClassLabel
CT1,CH1 0.095 0.0 0.00015 0.00162 0.0805 0.11345 0.0022 0.0117 1 1
CT2,CH2 0.0 0.00061 0.520 0.0236 0.0024 0.00153 0.097 0.0031 0.5 0
Training Dataset
![Page 47: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/47.jpg)
47
Occurrence of ground truth hashtag Hj in a candidate tweet < Ti ,Hj > is very few in number.
Higher number of negative samples In multiple occurrences my training dataset has a class distribution of 95% of negative samples and 5% of positive samples
Learning the model on an imbalanced dataset causes low precision
Imbalanced Training Dataset
![Page 48: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/48.jpg)
48
SMOTE Over
Sampling
Possible solutions is under sampling and over sampling. SMOTE - Synthetic Minority Oversampling Technique to resample to a balanced dataset of 50% of positive samples and negative samples
SMOTE does over-sampling by creating synthetic examples rather than over-sampling with replacement.
It takes each minority class sample and introduces synthetic examples along the line segments joining any/all of the k minority class nearest neighbors
This approach effectively forces the decision region of the minority class to become more general.
SMOTE: Synthetic Minority Over-sampling Technique (2002) by Nitesh V. Chawla , Kevin W. Bowyer , Lawrence O. Hall , W. Philip Kegelmeye: Journal of Artificial Intelligence Research
SMOTE Over Sampling
![Page 49: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/49.jpg)
49
Logistic Regression
Model
<Tweet(T1), Hashtag(H1) Pair>
<Candidate Tweet, Candidate Hashtag>CT1,CH1CT2,CH2
.
.CTi,CHj
1
Class Labels +ve samples
-ve samples
0
0 <Tweet(T2), Hashtag(H2) Pair>
<Candidate Tweet, Candidate Hashtag>CT1,CH1CT2,CH2
.
.CTi,CHj
11
0
<Tweet(Ti), Hashtag(Hj) Pair>
<Candidate Tweet, Candidate Hashtag>CT1,CH1CT2,CH2
.
.CTi,CHj
00
1
Feature Matrix
λ1 λ3
λ2
λ4
λ6λ5λ7
λ8λ9
Learning – Logistic Regression I use Logistic Regression Model over a generative model such as NBC or Bayes
networks as my features have lot of correlation. ( shown in evaluation )
![Page 50: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/50.jpg)
50
Test Dataset
My test dataset is represented in the same format as my training dataset as a feature matrix with the class labels unknown (removed).
<Tweet(T1), ?>
<Candidate Tweet, Candidate Hashtag>CT1,CH1CT2,CH2
.
.CTi,CHj
SimilarityScore
RecencyScore
SocialTrendScore
AttentionScore
FavoriteScore
MutualFriendScore
MutualFollowersScore
CommonHashtag Score
Reciprocal Rank
ClassLabel
CT1,CH1 0.034 0.7 0.0135 0.0621 0.0205 0.11345 0.22 0.611 1 ?
CT2,CH2 0.0 0.613 0.215 0.316 0.0224 0.0523 0.057 0.0301 0.5 ?
Test Dataset
![Page 51: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/51.jpg)
51
Classification
If the predicted probability is greater than 0.5 then the model labels the hashtag as 1 or 0 otherwise.
The hashtags labeled as 1 are likely to be the suitable hashtag.
I rank the top K recommended hashtags based on their probabilities.
Classification
Class Labels
1
0
Feature Matrix
??
?
<Query Tweet(Qi), ? >
<Candidate Tweet, Candidate Hashtag>CT1,CH1CT2,CH2
.
.CTi,CHj
Logistic Regression
Model
![Page 52: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/52.jpg)
52
Implementation – System Example 1
TweetSense (Top 10)
Baseline-SimGlobal (Top 10)
Baseline-SimTime (Top 10)
Baseline-SimRecCount(Top 10)
#KUWTK 0.989970778#tfiosmovie 0.985176542#CatchingFire 0.981380129#ANTM 0.968851541#GoTSeason4 0.946418848#Jofferyisdead 0.944493746#TFIOS 0.941791929#Lunch 0.940883835#MockingjayPart1trailer0.9344869#JoffreysWedding 0.934201161
#KUWTK 0.824264068712 #ANTM 0.583979541687 #Glee 0.453373612475 #NowPlaying 0.439078783215#Scandal 0.435994273991 #XFactor 0.425513196481 #Spotify 0.42500253688 #LALivin 0.424264068712 #PansBack 0.424264068712 #ornah 0.424264068712
#Scandal 0.82326311013#ornah 0.819013620132#LALivin 0.816627941101#KUWTK 0.814775850946#Glee 0.778570381907#SURFBOARD 0.746003141257#latergram 0.745075687756#Spotify 0.744375215512#NowPlaying 0.744375215512#EFCvAFC 0.730686523119
#Scandal 0.428809523257 #KUWTK 0.428809523257 #LALivin 0.426536795985 #PansBack 0.426536795985 #ornah 0.426536795985 #Glee 0.381746046493 #goodcompany 0.348682888787 #SURFBOARD 0.348682888787 #JLSQuiz 0.348682888787 #HungryAfricans 0.348682888787
![Page 53: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/53.jpg)
53
Implementation – System Example 2
TweetSense(Top 5)
Baseline-SimGlobal(Top 5)
Baseline-SimTime(Top 5)
Baseline-SimRecCount(Top 5)
#Eurovision 0.998892319#EurovisionSongContest2014 0.997934085#garybarlo0.989491417#UKIP 0.988958194#parents0.98511502
#photogeeks 0.6#FSTVLfeed 0.476912544#FestivalFriday 0.424264069#barkerscreeklife 0.420229873#IPv6 0.4
#photogeeks 0.907490888#FSTVLfeed 0.823842681#FestivalFriday 0.82085025#Pub49 0.745300825#monumentvalleygame0.738922
#photogeeks 0.600706714#FSTVLfeed 0.429211065#FestivalFriday 0.424970782#Pub49 0.353477299#sma20130.348530303
![Page 54: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/54.jpg)
54
Implementation – System Example 3
TweetSense(Top 5)
Baseline-SimGlobal(Top 5)
Baseline-SimTime(Top 5)
Baseline-SimRecCount(Top 5)
#boxing 0.996480078#GoldenBoyLive 0.9336961478#USC 0.913498443#AngelOsuna 0.911312201#paparazzi 0.90625792
#BoxeoBoricua 0.346937709#ListoParaHacerHistoria 0.2889#CaneloAngulo 0.272852636#6pm 0.261133502#Vallarta 0.252135503
#TU 0.517962946#regardless 0.489156945#legggoo 0.476362923#Shoutout 0.464033604#TeamH 0.44947086
#BoxeoBoricua 0.34687581#ListoParaHacerHistoria 0.2893#CaneloAngulo 0.27221214 #6pm 0.42458613#sonorasRest 0.42458613
![Page 55: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/55.jpg)
55
TweetSense
(Chapter 4) Ranking Methods
(Chapter 8) Conclusions
(Chapter 3) Modeling the Problem
(Chapter 7) Evaluation
(Chapter 5) Binary Classification
(Chapter 6) Experimental Setup
Outline
![Page 56: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/56.jpg)
Experimental Setup
56
Experimental Setup
![Page 57: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/57.jpg)
57
Dataset I randomly picked 63 users from a partial random distribution by navigating through the trending hashtags in Twitter.
Characteristic of the Dataset
Characteristics Value PercentageTotal number of users 63 N/ATotal Tweets Crawled 7,945,253 100%Tweets with Hashtags 1,883,086 23.70%Tweets without Hashtags 6,062,167 76.30% Tweets with exactly one Hashtag 1,322,237 16.64%Tweets with more than one Hashtag 560,849 7.06%Total number of tweets with user @mentions
716,738 58.63%
Total number of Favorite Tweets 4,658,659 9.02%Total number of tweets with Retweets 1,375,194 17.31%
Dataset
![Page 58: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/58.jpg)
58
Randomly pick the tweet with only one hashtag – avoids getting credit for recommending generic hashtags
Deliberately remove the hashtag and its retweets for evaluation
Pass the tweet as an input to my system TweetSense Get the recommended hashtag list Compare if the ground truth hashtag in the recommended list
Evaluation Method
![Page 59: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/59.jpg)
59
TweetSense
(Chapter 4) Ranking Methods
(Chapter 8) Conclusions
(Chapter 3) Modeling the Problem
(Chapter 7) Evaluation
(Chapter 5) Binary Classification
(Chapter 6) Experimental Setup
Outline
![Page 60: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/60.jpg)
Results
60
Evaluation
![Page 61: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/61.jpg)
61
External Evaluation
with Baseline for all 3 ranking
methods
Test users : 45 users & 1599 tweet Samples
5 1 0 1 5 2 00%
10%
20%
30%
40%
50%
60%
70%
45%
53%56%
59%
30%34%
38%42%
26%
33%37%
40%
24%29%
32%35%
External Evaluation with baseline on PRECISION @ N
TweetSense SimTime SimGlobal SimRecCount
Top N Hashtags recommended by the systemperc
enta
ge o
f sam
ple
twee
ts fo
r w
hich
the
hahs
tags
are
re
com
men
ded
corre
ctly
Total Number of Sample tweets : 1599 Total number of tweets for which hashtags are recommended correctly FOR PRECISON @ K=5 :TweetSense : 720 | SimTime: 487 | SimGlobal : 422 | SimRec: 384 |
TweetSense
Baseline
![Page 62: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/62.jpg)
62
Ranking Quality
RANKING QUALITY - TWEETSENSE
![Page 63: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/63.jpg)
63
Odds Ratio –
Feature Comparison
Similarity Score
Recency Score
Social Trend Score
Attention Score
Favorite Score
Mutual Friends Score
Mutual Followers Score
Common Hashtags Score
Reciprocal Score
0 2000 4000 6000 8000 10000 12000 14000 16000
0.0942
0.0022
0.0017
0
0.2837
13538.6542
0.0923
0
0.7144
ODDS RATIO - FEATURE COMPARISON – WITH ALL FEATURES
![Page 64: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/64.jpg)
64
ODDS RATIO - FEATURE COMPARISON – WITHOUT MUTUALFRIEND SCORE
Similarity Score
Recency Score
Social Trend Score
Attention Score
Favorite Score
Mutual Followers Score
Common Hashtags Score
Reciprocal Score
0 0.5 1 1.5 2 2.5 3 3.5
0.1123
0.0024
0.0017
0
0.24
3.115
0
0.7717
![Page 65: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/65.jpg)
65
ODDS RATIO - FEATURE COMPARISON – WITHOUT MUTUAL FRIEND, FOLLOWERS,RECIPROCAL SCORE
Similarity Score
Recency Score
Social Trend Score
Attention Score
Favorite Score
Common Hashtags Score
0 0.05 0.1 0.15 0.2 0.25
0.1134
0.0026
0.0016
0
0.2112
0
![Page 66: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/66.jpg)
66
Odds Ratio –
Feature Comparison
ODDS RATIO - FEATURE COMPARISON – ONLY MUTUAL FRIEND SCORE
Mutual Friends Score
0 0.05 0.1 0.15 0.2 0.25
0.2081
![Page 67: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/67.jpg)
67
Precision @n-
Only Mutual Friend
Feature Score
5 1 0 1 5 2 00%
10%
20%
30%
40%
50%
60%
70%
45%
53%56%
59%
30%34%
38%42%
26%
33%37%
40%
24%29%
32%35%
2%5%
8%11%
Feature Score comparison on PRECISION @ N with only mutual friend score
TweetSense SimTime SimGlobal SimRecCount OnlyMutualFriendScore
Top N Hashtags recommended by the system
perc
enta
ge o
f sam
ple
twee
ts fo
r w
hich
the
hahs
tags
are
re
com
men
ded
corre
ctly
Total Number of Sample tweets : 1599 Total number of tweets for which hashtags are recommended correctly FOR PRECISON @ K=5 :TweetSense : 720 | SimTime: 487 | SimGlobal : 422 | SimRec: 384 | OnlyMutualFriendRank: 37
TweetSense
Baseline
With only Mutual Friend Score
![Page 68: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/68.jpg)
68
TweetSense
(Chapter 4) Ranking Methods
(Chapter 8) Conclusions
(Chapter 3) Modeling the Problem
(Chapter 7) Results
(Chapter 5) Binary Classification
(Chapter 6) Experimental Setup
Outline
![Page 69: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/69.jpg)
Conclusion
69
Conclusion
![Page 70: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/70.jpg)
70
Proposed a system called TweetSense, which finds additional context for an orphaned tweet by recommending hashtags.
Proposed a better approach on choosing the candidate tweet set by looking at user’s social graph
Exploit the social signals along with the user’s tweet history to recommend personalized hashtags.
I do internal and external evaluation of my system Showed how my system performs better than the current state of art system
Summary
![Page 71: TWEETSENSE: RECOMMENDING HASHTAGS FOR ORPHANED TWEETS BY EXPLOITING SOCIAL SIGNALS IN TWITTER](https://reader036.fdocuments.us/reader036/viewer/2022062521/568163cb550346895dd509e0/html5/thumbnails/71.jpg)
71
Rectifying incorrect/irrelevant hashtags for tweets by identifying and/or adding the right hashtag for the tweets
“Named hashtag recognition” – Aggregate processing of tweets for sentiment and opinion mining
Use topic models to recommend hashtags based on topic distributions
Do a incremental learning version and make it as a online application.
Future Works