ESSIR 2013 - IR and Social Media
-
Upload
arjen-de-vries -
Category
Education
-
view
887 -
download
0
description
Transcript of ESSIR 2013 - IR and Social Media
9th European Summer School in Information Retrieval September 4th, 2013
http://bit.ly/ESSIR13IRSocMedia
IR and Social Media
Arjen P. de [email protected]
Centrum Wiskunde & InformaticaDelft University of Technology
Spinque B.V.
On slideshare,IR = Investor Relations
Social Media
Noun
social media (plural only)
Interactive forms of media that allow users to interact with and publish to each other, generally by means of the Internet.
The early 21st century saw a huge increase in social media thanks to the widespread availability of the Internet.
http://www.webanalyticsworld.net/2010/11/history-of-social-media-infographic.html
Social Media
“Social bookmarking” sites “User generated content”
Images (flickr) and videos (youtube, vimeo), but also blogs
Social network services Twitter, facebook
Not just one beast!
IR and Social Media?
Red Hot Chili Peppers
“Rock group” in author’s metadata...
Organisation in groups may help
disambiguate query!
More implicit metadata...
Information Science
“Search for the fundamental knowledge which will allow us to postulate and utilize the most efficient combination of [human and machine] resources”
M.E. Senko. Information systems: records, relations, sets, entities, and things. Information systems, 1(1):3–13, 1975.
Core Questions
How to represent information? The information need and search requests The objects to be shown in response to an
information request
How to match information representations?
IR and Social Media
Richer information representations!
Richer representations
User profiles User name, full name, description, image,
homepage url, etc.
Connections between users Networks of friends, followers, etc
Comments/reactions Endorsing and sharing
Q: Web ancient social media?
(C) 2008, The New York Times Company
Anchor tekst: “continue reading”
Not a lot of info to represent the page…
Een fan’s hyves page:Kyteman's HipHop Orchestra: www.kyteman.com
Kaartverkoop luxor theater:22 mei - Kyteman's hiphop Orkest - www.kyteman.com
Kluun.nl:De site van Kyteman
Blog Rockin’ Beats:De 21-jarige Kyteman (trompettist, componist en Producer Colin Benders), heeft drie jaar gewerkt aan zijn debuut:the Hermit sessions.
Jazzenzo:...een optreden van het populaireKyteman’s Hiphop Orkest
‘Co-creation’
Social Media: Consumer becomes a co-creator ‘Data consumption’ traces
In essence: many new sources to play the role of anchor text Tags and/or ratings Tweets Comments, reviews
Potential Benefits for IR
Expand content representation Reduce the vocabulary gap(s) between
creators of content, indexers, and users More diverse views on the same content
Potential Benefits for IR
Relevance depends on user context User task User knowledge
Potential Benefits for IR
Relevance depends on user context User task User knowledge
Social media provide an opportunity to make much better assumptions about user context A specific user’s context The variety of user contexts that may exist
Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders.
The task dependent effect of tags and ratings on social media access.
TOIS 28, 4, article 21 (November 2010), 42 pages.
LibraryThing
LibraryThing
Items People Tags Ratings
See also: http://www.macle.nl/tud/LT/
Synonyms
Synonyms
Examples
Humour
Classic
LibraryThing
Items People Tags Ratings
See also: http://www.macle.nl/tud/LT/
Search with Random Walk
Present nodes according to estimated probability that a random walk that starts from (task dependent) starting nodes, would end at this node
E.g., tag suggestion starts in a tag node; personalized search in tag and user nodes
Tagging Relationships
An item recommendation walk
Ratings
Ratings may enhance the graph, or just be used for evaluation
Personalized Search
Assume a user who types a single tag as query
Personalized Search
A soft clustering effect smoothly relates similar concepts before converging to the background probability
Homographs like “Java” are disambiguated because the walk starts in both the query tag and the target user So, content that matches the user’s
preference is more likely to be found first
Common System Designs
Analysis results
Allowing all users to tag all available content improves retrieval tasks
Combining tags and ratings may improve both search and recommendation tasks
Ternary relation lost!
The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices
Ternary relation lost!
The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices Potentially a problem if tags express opinion
about an item; e.g., “poetry” can independent from item still describe
the user “awful” requires to know what item the term
belongs to
Tags vs. rating
Most tags do not deviate far from the mean rating
Only few tags strongly correlated with opinion Note: poetry higher quality than chicklit
Metadata
Scientific articles have many types of metadata associated: Abstract Author Booktitle Description Journal Tags
Are all these types of metadata useful for item recommendation?
Metadata
According to Toine Bogers’ PhD thesis: Concatenate all fields associated to a single
user’s profile’s items into one huge text field, and use an off-the-shelf IR model to match the profile against metadata of the items.“Profile-centric Matching”
Or, construct item profiles from meta-data of all users for that item, and apply an item-based collaborative filtering approach“Item-based Hybrid Filtering”
Author, description, tags, title, url, journal and booktitle all contribute
Finally: a recent case study
Artist Popularity?
Let’s ask widely used social media music platforms! I.e., query their APIs
Artist Popularity (1-3)
Top-5 popular artists in dataset Jan 21 – Mar 21 3 hourly timestamped popularity indices
http://bit.ly/ESSIR13IRSocMedia
Artist Popularity
Artist Popularity (?!)
Top-5 popular artists in dataset Jan 21 – Mar 21 3 hourly timestamped popularity indices
The Black Keys
The Black Keys
Three grammy awards received!
The Black Keys
Web responds, while service based popularity index is static
Implications
An “artist popularity” index depends on the platform and its user population
Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events Suitable as an academics’ search log
replacement?
Implications
An “artist popularity” index depends on the platform and its user population
Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events Suitable as an academics’ search log
replacement?
Q: What is the most useful popularity – one that changes dynamically or one that lasts?
Many topics I skipped…
Tweets about blip.tv
“Twanchor text” E.g.: http://blip.tv/file/2168377
Amazing Watching “World’s most realistic 3D city
models?” Google Earth/Maps killer Ludvig Emgard shows how maps/satellite pics
on web is done (learn Google and MS!) and ~120 more Tweets
Wikipedia
Wikipedia contains semantically very rich annotations: Wikipedia Categories Wikipedia Lists Times (1930, 1931, 1932, etc. etc.) Names Disambiguation pages
Etc.
Note: DBPedia is just Wikipedia
Wikipedia
People have used Wikipedia edit history to look for events
Geotags / POIs
Many social media items carry explicit geo information Geotags are low-level “coordinates” POIs are high-level “point-of-interest” labels
Applications Recommend geo-locations to people Predict POI tags from (tweet) text Predict where a user will go next
Map text to locations
Build a language model from all tags assigned to flickr images that belong to a predefined grid cell
Neighbouring cells used for smoothing (like hierarchic language models used previously for video / scene / shot)
User frequency of a term in a location (instead of term frequency)
Neil O’Hare and Vanessa Murdock
Modeling Locations with Social Media
Information Retrieval, February 2013, Volume 16, Issue 1, pp 30-62
Placing Images: Easyhttp://www.flickr.com/photos/63666148@N00/3615989115/
Athens, Ohio or Athens, Greece?
Placing Images: Hard
Ballooning company in Ottawa
Searching the Social Graph
Search entities, and the relationships between them, in the (facebook) social graph
Clearly IR problems, but who has the data to work with?
Micheal Curtiss et al.
Unicorn: A System for Searching the Social Graph
PVLDB, Vol. 6, No. 11
Crawling
How to get “the” data? Rate limited APIs ToS
HEADACHES!
Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley
Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose
ICWSM 2013
Not IR yet, but… Interesting stuff nevertheless!
de Volkskrant, March 13, 2013
Michal Kosinski, David Stillwell, and Thore Graepel
Private traits and attributes are predictable from digital records of human behavior
PNAS 2013 ; published ahead of print March 11, 2013, doi:10.1073/pnas.1218772110
Take home message(s)
Take home message(s)
Social media give us IR researchers access to a rich resource of context Including time & location!
Take home message(s)
Social media give us IR researchers access to a rich resource of context Including time & location!
Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly
Take home message(s)
Social media give us IR researchers access to a rich resource of context Including time & location!
Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly
Various recommendation and retrieval tasks exist in social media – can one theory address all of these?
C U @ #ECIR2014 ? !