Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
-
Upload
web-information-systems-tu-delft -
Category
Technology
-
view
1.150 -
download
1
description
Transcript of Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
DelftUniversity ofTechnology
Twitter, Twinder, Twitcident: Filtering and Search on Social Web StreamsData Bridges Workshop, Inria, Paris, April 12th 2012
Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao
Web Information Systems, TU Delft, the Netherlands
2Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
200,000,000number of tweets published per day
3
Pukkelpop 2011
People tweet about everything,
everywhere :-)
4
Pukkelpop 2011
81,000 tweets in four hours
became a tragedy
Filtering
200,000,000
Search & Browsing
5Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
1. (Automatic) Filtering: Given a topic (e.g. expressed via some keywords), how can one automatically identify those tweets that are relevant to the topic?
2. Search & Browsing: How can one improve search and browsing capabilities so that users can explore information in the streams of tweets (that are relevant for a topic)?
Twitter streams
Challenges
Filtering
topic
Search & Browsing
information need
Twinder filtering and search framework
6Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
1. Filtering of Twitter streams
Twitter streams
Filtering
topic
Search & Browsing
information need
7Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Filtering on Twitter Query: www2012
Typical approach:Keyword-based matching
Are there further features that can be used as indicators for estimating the relevance of a
tweet for a topic?
8Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Syntactical feature: hashtagsIs a tweet more relevant if it contains a #hashtag?
# Hashtag
Hypothesis: tweets that contain hashtags are more likely to be relevant than tweets that do not contain hashtags.
9Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Syntactical feature: URLsIs a tweet that contains a URL more relevant?
URL
Hypothesis: tweets that contain a URL are more likely to be relevant than tweets that do not contain a URL.
10Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Syntactical feature: “mentions”Is a tweet that mentions @somebody more relevant?
@mention
Reply
Hypothesis: tweets that are formulated as a reply to another tweet are less likely to be relevant than other tweets.
11Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Syntactical feature: lengthDoes the length of a tweet influence its relevance for a topic?
Hypothesis: the longer a tweet, the more likely it is to be relevant and interesting.
54 characters (9 words)
140 characters (20 words)
vs.
12Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Overview of features
Topic sensitive Topic insensitive
Keyword-based relevance
Syntactical features
Topic-sensitive and topic-insensitive features
What about the semantics?
13Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Semantic features: number of entities
Find semantics in a tweet to estimate the relevance
dbp:France
dbp:Tim_Berners-Lee dbp:World_Wide_Web
dbp:Lyon
dbp:WWW_Conference
Hypothesis: the more entities a tweet mentions, the more likely it is to be relevant and interesting.
14Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
I plan to visit Paris, Bordeaux, Grenoble, Nice, Marseille and Lyon.
Semantic features: diversityThe types of entities that are featured by a tweet matter
Person Thing
PlacePlace
Event
Hypothesis: the higher the diversity of entities that are mentioned in a tweet, the more likely it is to be relevant.
Place Place
Place Place Place Place
vs.
15Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Semantic features: sentimentOpinions expressed in tweets are interesting
Hypothesis: the likelihood of a tweet’s relevance is influenced by its sentiment polarity.
Looking forward to the WWW conference :-) Yes!
I plan to visit Paris, Bordeaux, Grenoble, Nice, Marseille and Lyon.
Why are the big players not releasing query logs to the WWW community? :-( #fail
:-) neutral :-(vs. vs.
16Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Semantic relatednessExploit semantics to relate query with tweets
Query : “TBL WWW keynote”
dbp:Tim_Berners-Lee
dbp:International_World_Wide_Web_Conference
17Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Overview of features
Topic sensitive Topic insensitive
Keyword-based Syntactical
Semantic-based Semantics
Context? Context?
By now, we have 4 types of features.
What kind of contextual features might be helpful?
18Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Contextual feature: authority of the publisherIt matters who published a tweet
Hypothesis: the higher the number of tweets that have been published by the creator of a tweet, the more likely it is that the tweet is relevant.
19Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Contextual feature: time w.r.t. query
When was a tweet published?
Hypothesis: the lower the temporal distance between the query time and the creation time of a tweet, the more likely is the tweet relevant to the topic.
Tweet
March 31
query
April 16
20Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Summary of Features
Topic sensitive Topic insensitive
Keyword-based Syntactical
Semantic-based Semantics
Context-based Context
21Challenge the future
without semantics 0.3363 0.4828 0.3965
all features 0.3674 0.4736 0.4138
ResultsAchieved for the TREC Microblog Challenge
Features Precision Recall F-measure
keyword relevance
0.3040 0.2924 0.2981
semantic relevance
0.3053 0.2931 0.2991
Overall, we can achieve the precision and recall of over 35% and 45% respectively by applying all the features.
22Challenge the future
Importance of Features
hasHashtag hasURL isReply length
-1
0
1
2
Keyword-based relevance
-1
0
1
2
Relevance Relatedness
-1
0
1
2
#entities diversity sentiment
-1
0
1
2
Topic-sensitive
Keyword-based relevance
-1
0
1
2
Temporal context Keyword-based relevance
-1
0
1
2
Social context
Topic-insensitive
Keyword-based
Semantic-based
Context-based
Syntactical
Semantics
Context
Semantic relatedness, URLs, !isReply, diversity and
sentiment are good indicators for estimating the
relevance of a tweet.
23Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
2. Search & Browsing in Twitter Streams
Twitter streams
Filtering
topic
Search & Browsing
information need
24Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Idea: Faceted Search
Locations more...
Events more...
Music Artists:+ Guilty Simpson+ Bryan Adams+ Elton John+ Golden Earringmore...
Current Query:
Results:1. Yskiddd: Next saturday
@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2
2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL
3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents
Eindhoven Music
Expand Query:
Suggestions:+ Guilty Simpson+ Area51
25Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Adaptive Faceted Search
Adaptive Faceted Search
Twitter posts
Semantic Enrichment
User and Context Modeling
user
How to adapt the facet-value pair ranking to the
current demands of the user?
query suggestions
How to represent the content of a
tweet? facet extraction
26Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Facet Extraction and Semantic Enrichment
@bob: Julian Assange got arrested http://bit.ly/5d4r2t
Julian Assange
Julian Assange Tweet-basedenrichment
Julian Assange arrestedJulian Assange, the founder ofWikiLeaks, is under arrest inLondon…
Link-basedenrichment
Julian Assange
London
WikiLeaks
Julian Assange Julian Assange
LondonWikiLeaks
powered by
27Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Faceted-search vs. hashtag-based (keyword) search
Faceted search based on semantic
enrichment of tweets outperforms
hashtgag-based search significantly.
28Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Impact of link-based enrichmentPersonalized strategy outperforms baseline
significantly
Link-based enrichment improves quality for
both strategies
29Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Twitcident: Applying filter & search functionality for distilling information from Twitter during incidents (e.g. fires, extreme weather situations)
Twitter streams
Filtering
topic
Search & Browsing
information need
Twitcident application
30
Pukkelpop 2011
81,000 tweets in four hours
became a tragedy
Filtering
200,000,000
Search & Browsing
31Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Browsing
32
Filtered Twitter stream
Faceted Search
33
Real-time visualizations
34
Could we see it coming?
Term usage 25 minutes before the incident
1. heavy weather, hail balls, lightning, pitch black…2. drama, panic, hell, serious, extreme…
Impact storm
“ ”
Popular artist made a joke
about the weather
35
Spotting eye witnesses
36
Real-time information from eyewitness
37Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
SummaryAutomatic Filtering of Tweets: • Topic-sensitive and topic-insensitive features • Semantic features (semantic relatedness, diversity,
sentiment are beneficial)
Search and browsing:• Faceted Search• Personalization & contextualization helps
Application:• Twitcident: fulfilling information needs during incidents
Future works:• Weak signal detection based on tweets• Duplicate detection
[ISWC ’11]
[Hypertext ‘12, Demo@WWW’12]
[#MSM@WWW ’12]
38Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams
Thank you!
@fabianabelhttp://wis.ewi.tudelft.nl/