Thesis Proposal: Prediction of popular social annotations Abon.
-
Upload
colin-wilkins -
Category
Documents
-
view
222 -
download
1
Transcript of Thesis Proposal: Prediction of popular social annotations Abon.
Outline
Background Related Work Problem Definition Possible Solution Experiment Plan Evaluation Plan
Background
Prevalence of social web services e.g.
MY WEBSITE
WHAT DO THEY HAVE IN COMMONTAGS & User Generated Content
BackgroundTAGs are for ?
According to del.icio.us founderTags are one-word descriptors that you can assign
to your bookmarks on del.icio.us to help you organize and remember them. Tags are a little bit like keywords, but they're chosen by you, and they do not form a hierarchy. You can assign as many tags to a bookmark as you like and rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders.
Blah blah blah…..
BackgroundTAGs are for ?
According to del.icio.us founderTags are one-word descriptors that you can assign
to your bookmarks on del.icio.us to help you organize and to remember them. Tags are a little bit like keywords, but they're chosen by you, and they do not form a hierarchy. You can assign as many tags to a bookmark as you like and rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders.
Why TAGs are useful
In Information Retrieval field, it is a common
technique to expand query to get more related data.
Tags are like human-expanded index term.
Why TAGs are useful
Traditional term expansion scheme relies on term-document relations. And each tag’s importance to a document is often determined by tf-idf.
For each tag user applies, it is like voting for what tag should be with some document. Thus the term-document relations could be measured by tag applications.
Why TAGs are useful
Tags are human-expanded query set which enables more complete concept mapping.
With more and more people applying tags,
the popularity of tags reach a stable pattern.
and top tags could be used as weighting parameters for search optimization
Related Work
Usage patterns of collaborative tagging systems J. Inf. Sci., Vol. 32, No. 2. (April 2006), pp. 198-208.by Golder SA, Huberman BA .
100+ users , stable pattern appear Urn model
Related Work
Collaborative Tagging and Semiotic Dynamics
Cattuto C,LoretoV, Pietronero L. Long-term memory version of the classic Yul
e–Simon process Memory model based on cognitive model
Related work
The Complex Dynamics of Collaborative Tagging,'‘
H.~Halpin,V.~Robu,H.~Shepherd in Proceedings of WWW 2007
P(x) :
tag probability distribution at each time point
Q(x) :
The final tag probability distribution
Problem definition In initial stage, each url is not sufficiently annot
ated by people. Thus, it is hard to be retrieved at this time.
For an immature url, predicting future popular tags could provide better retrieval experience.
Mature url : Borrowed from [Halpin] ‘s empirical results for tag dynamics. They are defined as
urls with 3+ more years of history on del.icio.us
Expanding tag set
Ti{ } : The tag set applied by the ith user for an url.
ETi {}:The expanded tag set after the ith user.
T0{ } : The tag set suggested by tf-idf term extraction. STi=T0
ETi=ETi-1 relevant∪ n(Ti)
relevantn(Ti)=The n tags with top mutual information to each tag in Ti
Mutual information: f(ti,tj)/f(ti)*f(tj)
Cohesivity
Each tag in ETi has a score which indicates its cohesivity to ETi
cohesivity of tj to ETi Σf(tk,tj)/f(tj)*f(tk)
tk belongs toETi
Pruning ETi
1. Sort tags in ETi by popularity , take top 7 as suggesting tag set STi
2. Sort tags in ETi by popularity*cohesivity , take top 7 as suggesting tag set STi
Experiment Plan
Dataset from del.icio.us rss api Mar 28~April 19, 30000 of url, 234982 of tagging, 8392 of users
1.del.icio.us/rss/popular every 30min
del.icio.us/rss/recent every 2 min
2.del.icio.us/rss/url?url= xxx.com Suggesting tags from no user to the 10th use
r.