„IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in...
-
date post
18-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in...
![Page 1: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/1.jpg)
„IP“ is not always „Internet Protocol“A long and a very short example for IP problems in Web 2.0 research
Ralf Schenkel
Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum
![Page 2: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/2.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Social Tagging NetworksDefinition: Social Tagging NetworkWebsite where people• publish + tag information• review + rate information• publish their interests• maintain network of friends• interact with friends
Common examples:• Flickr (images)• YouTube (videos)• del.icio.us (bookmarks)• Librarything (books)
• Discogs (CDs)• CiteULike (papers)• Facebook• Myspace (media)
![Page 3: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/3.jpg)
Part 1: Search in Social Tagging Networks
(long)
![Page 4: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/4.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Some Statistics
Flickr: (as of Nov 2007)• 2+ billion photosFacebook: (as of Apr 2007)• 1.8 billion photos• 31 million active users• 100,000 new users per day
Myspace: (as of Apr 2007)• 135 million users (6th largest country on Earth)• 2+ billion images (150,000 req/s), millions added daily• 25 million songs• 60TB videos
Huge volume of highly dynamic data
![Page 5: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/5.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Showcase: librarything.com
RatingsTagsBooks
Others
![Page 6: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/6.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
librarything.com: Social Interaction
Explicit Friends
Similar Users
Comments
![Page 7: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/7.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
librarything.com: Tag Clouds
![Page 8: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/8.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
librarything.com: Search
Search results independent of the querying user(and the social context)
Search results independent of the querying user(and the social context)
![Page 9: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/9.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Outline
• Introduction
• Modelling Social Tagging Networks– Graph Model
– Different Information Needs
• Effective Query Scoring
• Efficient Query Evaluation
• Summary & Further Challenges
![Page 10: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/10.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Social Network Model
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
![Page 11: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/11.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Social Network Model
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
![Page 12: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/12.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Social Network Model
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
travel travel
tripvldb
travel probability
queuestravel
probability
harrypotter
![Page 13: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/13.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Information Need 1: Global
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
travel travel
tripvldb
travel probability
queuestravel
probability
harrypotter
harry potter
Tags by all users equally important
![Page 14: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/14.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Information Need 2: Similar Users
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
travel travel
tripvldb
travel probability
queuestravel
probability
harrypotter
travel
?Tags by users with similar tags/items
more important
![Page 15: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/15.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Information Need 3: Trusted Friends
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
travel travel
tripvldb
travel probability
queuestravel
probability
harrypotter
probability
?Tags by closely related users
more important
![Page 16: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/16.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Wishlist for Social-Aware Social Search• Search results depend on
– Global popularity of items– Collection context of the querying user (books, tags)– Social context of the querying user (trusted friends)
• Automatic tag expansion (beyond synonyms)• Scalable query processing• Explanation of results
(similar wishlist for social recommendations)
![Page 17: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/17.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Fast Forward…
Imagine a 20 minutes talk aboutquantified friendship measures,
personalized scoring models,dynamic tag expansion,
scalable query processing, …
Essence:
• Context-aware personalized search
• Tags from closely related users are more important
• Different kinds of „relatedness“ possible[SIGIR 2008]
![Page 18: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/18.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Experimental Evaluation: Effectiveness
Systematic evaluation of result quality difficult
Three possible setups:• Manual queries + human assessments• Queries+assessments derived from external info
(ex: DMOZ categories)• Automated assessments from context of user
– Items tagged by friends– Items tagged in the future
?
![Page 19: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/19.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Prototype Implementation
[SIG
IR D
emo
2008
], [V
LDB
Dem
o 20
08]
![Page 20: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/20.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Preliminary User StudyLibraryThing user study: [Data Engineering Bulletin, June 2008]• 6 librarything users with reasonably large library and friend sets• Overall 49 queries• Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags,
~12,000 users, ~18,000 friends• Measured NDCG[10]
0.0 0.2 0.5 0.8 1.0
0.0 0.546 0.572 0.568 0.565 0.565
0.2 0.564 0.572 0.579 0.581 -
0.5 0.539 0.552 0.559 - -
0.8 0.515 0.546 - - -
1.0 0.465 - - - -
(1-α)(graph)
(1-α) (content)
Authors of the paper
![Page 21: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/21.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
We need a benchmark collection, but…• Everybody „has“ data from Flickr, librarything• Data contains private information by definition• Data cannot be successfully anonymized (AOL)• Data must not be anonymized
(we need the users to assess results)• Data must be large scale
(a few volunteers are not enough)• Collection must be completely offline available
for stability of results (including images,…)
![Page 22: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/22.jpg)
Part 2: Web Archiving
(very short)
![Page 23: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d235503460f949f9c50/html5/thumbnails/23.jpg)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Online Information is Volatile• Huge amount of information available online only
today• Easily lost (hardware failure, software failure,
human failure, deletion, attack, …)• Easily unaccessible (anybody knows Interleaf?)• Easily manipulated• How will historians learn about the 21th century?
Strong need for long-term preservationof the evolving Web