Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

38
Delft University of Technology Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams Data Bridges Workshop, Inria, Paris, April 12 th 2012 Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao Web Information Systems, TU Delft, the Netherlands

description

Slides

Transcript of Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

Page 1: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

DelftUniversity ofTechnology

Twitter, Twinder, Twitcident: Filtering and Search on Social Web StreamsData Bridges Workshop, Inria, Paris, April 12th 2012

Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao

Web Information Systems, TU Delft, the Netherlands

Page 2: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

2Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

200,000,000number of tweets published per day

Page 3: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

3

Pukkelpop 2011

People tweet about everything,

everywhere :-)

Page 4: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

4

Pukkelpop 2011

81,000 tweets in four hours

became a tragedy

Filtering

200,000,000

Search & Browsing

Page 5: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

5Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

1. (Automatic) Filtering: Given a topic (e.g. expressed via some keywords), how can one automatically identify those tweets that are relevant to the topic?

2. Search & Browsing: How can one improve search and browsing capabilities so that users can explore information in the streams of tweets (that are relevant for a topic)?

Twitter streams

Challenges

Filtering

topic

Search & Browsing

information need

Twinder filtering and search framework

Page 6: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

6Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

1. Filtering of Twitter streams

Twitter streams

Filtering

topic

Search & Browsing

information need

Page 7: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

7Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Filtering on Twitter Query: www2012

Typical approach:Keyword-based matching

Are there further features that can be used as indicators for estimating the relevance of a

tweet for a topic?

Page 8: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

8Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Syntactical feature: hashtagsIs a tweet more relevant if it contains a #hashtag?

# Hashtag

Hypothesis: tweets that contain hashtags are more likely to be relevant than tweets that do not contain hashtags.

Page 9: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

9Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Syntactical feature: URLsIs a tweet that contains a URL more relevant?

URL

Hypothesis: tweets that contain a URL are more likely to be relevant than tweets that do not contain a URL.

Page 10: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

10Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Syntactical feature: “mentions”Is a tweet that mentions @somebody more relevant?

@mention

Reply

Hypothesis: tweets that are formulated as a reply to another tweet are less likely to be relevant than other tweets.

Page 11: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

11Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Syntactical feature: lengthDoes the length of a tweet influence its relevance for a topic?

Hypothesis: the longer a tweet, the more likely it is to be relevant and interesting.

54 characters (9 words)

140 characters (20 words)

vs.

Page 12: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

12Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Overview of features

Topic sensitive Topic insensitive

Keyword-based relevance

Syntactical features

Topic-sensitive and topic-insensitive features

What about the semantics?

Page 13: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

13Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Semantic features: number of entities

Find semantics in a tweet to estimate the relevance

dbp:France

dbp:Tim_Berners-Lee dbp:World_Wide_Web

dbp:Lyon

dbp:WWW_Conference

Hypothesis: the more entities a tweet mentions, the more likely it is to be relevant and interesting.

Page 14: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

14Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

I plan to visit Paris, Bordeaux, Grenoble, Nice, Marseille and Lyon.

Semantic features: diversityThe types of entities that are featured by a tweet matter

Person Thing

PlacePlace

Event

Hypothesis: the higher the diversity of entities that are mentioned in a tweet, the more likely it is to be relevant.

Place Place

Place Place Place Place

vs.

Page 15: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

15Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Semantic features: sentimentOpinions expressed in tweets are interesting

Hypothesis: the likelihood of a tweet’s relevance is influenced by its sentiment polarity.

Looking forward to the WWW conference :-) Yes!

I plan to visit Paris, Bordeaux, Grenoble, Nice, Marseille and Lyon.

Why are the big players not releasing query logs to the WWW community? :-( #fail

:-) neutral :-(vs. vs.

Page 16: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

16Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Semantic relatednessExploit semantics to relate query with tweets

Query : “TBL WWW keynote”

dbp:Tim_Berners-Lee

dbp:International_World_Wide_Web_Conference

Page 17: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

17Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Overview of features

Topic sensitive Topic insensitive

Keyword-based Syntactical

Semantic-based Semantics

Context? Context?

By now, we have 4 types of features.

What kind of contextual features might be helpful?

Page 18: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

18Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Contextual feature: authority of the publisherIt matters who published a tweet

Hypothesis: the higher the number of tweets that have been published by the creator of a tweet, the more likely it is that the tweet is relevant.

Page 19: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

19Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Contextual feature: time w.r.t. query

When was a tweet published?

Hypothesis: the lower the temporal distance between the query time and the creation time of a tweet, the more likely is the tweet relevant to the topic.

Tweet

March 31

query

April 16

Page 20: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

20Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Summary of Features

Topic sensitive Topic insensitive

Keyword-based Syntactical

Semantic-based Semantics

Context-based Context

Page 21: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

21Challenge the future

without semantics 0.3363 0.4828 0.3965

all features 0.3674 0.4736 0.4138

ResultsAchieved for the TREC Microblog Challenge

Features Precision Recall F-measure

keyword relevance

0.3040 0.2924 0.2981

semantic relevance

0.3053 0.2931 0.2991

Overall, we can achieve the precision and recall of over 35% and 45% respectively by applying all the features.

Page 22: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

22Challenge the future

Importance of Features

hasHashtag hasURL isReply length

-1

0

1

2

Keyword-based relevance

-1

0

1

2

Relevance Relatedness

-1

0

1

2

#entities diversity sentiment

-1

0

1

2

Topic-sensitive

Keyword-based relevance

-1

0

1

2

Temporal context Keyword-based relevance

-1

0

1

2

Social context

Topic-insensitive

Keyword-based

Semantic-based

Context-based

Syntactical

Semantics

Context

Semantic relatedness, URLs, !isReply, diversity and

sentiment are good indicators for estimating the

relevance of a tweet.

Page 23: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

23Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

2. Search & Browsing in Twitter Streams

Twitter streams

Filtering

topic

Search & Browsing

information need

Page 24: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

24Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Idea: Faceted Search

Locations more...

Events more...

Music Artists:+ Guilty Simpson+ Bryan Adams+ Elton John+ Golden Earringmore...

Current Query:

Results:1. Yskiddd: Next saturday

@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2

2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL

3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents

Eindhoven Music

Expand Query:

Suggestions:+ Guilty Simpson+ Area51

Page 25: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

25Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Adaptive Faceted Search

Adaptive Faceted Search

Twitter posts

Semantic Enrichment

User and Context Modeling

user

How to adapt the facet-value pair ranking to the

current demands of the user?

query suggestions

How to represent the content of a

tweet? facet extraction

Page 26: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

26Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Facet Extraction and Semantic Enrichment

@bob: Julian Assange got arrested http://bit.ly/5d4r2t

Julian Assange

Julian Assange Tweet-basedenrichment

Julian Assange arrestedJulian Assange, the founder ofWikiLeaks, is under arrest inLondon…

Link-basedenrichment

Julian Assange

London

WikiLeaks

Julian Assange Julian Assange

LondonWikiLeaks

powered by

Page 27: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

27Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Faceted-search vs. hashtag-based (keyword) search

Faceted search based on semantic

enrichment of tweets outperforms

hashtgag-based search significantly.

Page 28: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

28Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Impact of link-based enrichmentPersonalized strategy outperforms baseline

significantly

Link-based enrichment improves quality for

both strategies

Page 29: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

29Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Twitcident: Applying filter & search functionality for distilling information from Twitter during incidents (e.g. fires, extreme weather situations)

Twitter streams

Filtering

topic

Search & Browsing

information need

Twitcident application

Page 30: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

30

Pukkelpop 2011

81,000 tweets in four hours

became a tragedy

Filtering

200,000,000

Search & Browsing

Page 31: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

31Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Browsing

Page 32: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

32

Filtered Twitter stream

Faceted Search

Page 33: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

33

Real-time visualizations

Page 34: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

34

Could we see it coming?

Term usage 25 minutes before the incident

1. heavy weather, hail balls, lightning, pitch black…2. drama, panic, hell, serious, extreme…

Impact storm

“ ”

Popular artist made a joke

about the weather

Page 35: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

35

Spotting eye witnesses

Page 36: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

36

Real-time information from eyewitness

Page 37: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

37Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

SummaryAutomatic Filtering of Tweets: • Topic-sensitive and topic-insensitive features • Semantic features (semantic relatedness, diversity,

sentiment are beneficial)

Search and browsing:• Faceted Search• Personalization & contextualization helps

Application:• Twitcident: fulfilling information needs during incidents

Future works:• Weak signal detection based on tweets• Duplicate detection

[ISWC ’11]

[Hypertext ‘12, Demo@WWW’12]

[#MSM@WWW ’12]

Page 38: Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

38Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams

Thank you!

@fabianabelhttp://wis.ewi.tudelft.nl/