Improving Flickr discovery through Wikipedias
-
Upload
federico-gobbo -
Category
Technology
-
view
1.161 -
download
1
description
Transcript of Improving Flickr discovery through Wikipedias
1/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Improving Flickr discovery through Wikipedias
Federico Gobbo{federico.gobbo}@uninsubria.it
Universita degli Studi dell’InsubriaVarese, Italy
(cc) Some rights reserved.
2/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
1 IntroductionWhy folksonomies are interesting
2 FolksonomiesWhy folksonomies differ?
3 Linguistic issuesAugmented folksonomies through natural language
4 Introducing FlickrpediaMultilingual diversity as the source of knowledge
5 Concluding Remarks
3/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Why folksonomies are interesting
A key question of information retrieval today
How to add meaningful metadata to web content, in order toincrease the utility of information by improve the precision ofinformation retrieval to search engines?
4/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Why folksonomies are interesting
Folksonomies, a tentative answer. What are they?
folksonomy = folks + taxonomy
A folksonomy is made by tags or labels, usually single-wordmetadata attached to online items (documents, photos, videos,etc.), in order to add contextual meaning to the items themselves.
Folksonomies are a tentative effort toward the goal of improvingthe precision of information retrieval.
5/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Why folksonomies differ?
Folksonomies and traditional taxonomies
Unlike traditional taxonomies, there is no explicit hierarchybetween tags nor tags are exclusive. For example, the photo of a
cat may be tagged as ‘cat’ and ‘european’ and ‘animal’, but thereis nothing that say that all cats are animals: tags can be seen ascommon facets of the item itself (Schmitz 2006). There is no
central authority, and this is the main reason why folksonomies arebecoming more and more popular among web resource users.
6/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Why folksonomies differ?
The two different scopes of folksonomies
Each tag has two different scopes at the same time:
personimy, the user’s defined one (Quintarelli 2005);
consensus, the social shared meaning.
Consensus is becoming more and more important, as the wide useof tag suggestion interfaces in web applications suggests.
7/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Why folksonomies differ?
Folksonomies and the Long Tail (see the video!)
8/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Why folksonomies differ?
The key concept of serendipity
Consensus permits serendipity, i.e. users dig the web through tagsfinding new, unexpected and useful content, not easily accessiblevia traditional search engines.
Tags are used as filters, i.e. a query on more tags returns the itemstagged with any of the given tags – or with all tags, depending onthe application (Golder and Huberman 2006).
The purpose of this paper is to improve serendipity allowing peopleto dig folksonomies regardless of the natural language(s) theymaster.
9/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Augmented folksonomies through natural language
Tags as linguistic objects
Tags are words, i.e. alphabetical strings meaningful in somenatural language. There is no controlled language. In particular,features unrecognized are:
synonymity (different word strings, analogue meaning);
homography (identical word string, totally different meaning);
different strategies in encoding are possibles (e.g.‘28-03-2008’, ‘2008March3’, ‘3rd March 2008’);
misspellings are very frequent, so standard NLP techniques arebanned.
Guy and Tonkin (2006) even advocated tag literacy education.
10/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Augmented folksonomies through natural language
The linguistic divide in folksonomies
Multilingualism is an issue not fully explored yet in folksonomies.In fact, tags are written in a human language and users areinclined to write in the languages they are comfortable in.
It is certainly desiderable for a user not comfortable in English orother big language (in terms of presence in the web) to search andfind tags using a search engine interface in his or her tongue, whilethe engine searches the corresponding tags in English and in othermajor human languages.
11/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Multilingual diversity as the source of knowledge
How to overcome the linguistic divide?
A proposal: through a special web application which extracts thepairs language-tags in every available language before passing thetags to the folksonomy search engine.
The claim is improvement in serendipity: when searching in 20natural languages at the same time, some interesting data will befound, undiscovered through a single language search.
12/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Multilingual diversity as the source of knowledge
Flickr and its API
Flickr is one of the most popular web applications for photos (+2million photos are found if ‘flowers’ are searched, nowadays).Photos are freely tagged by users, so it can be considered afolksonomy.
Open source APIs in major programming languages are availableand people can make queries to the Flickr repository through anauthentication key given on request.
http://www.flickr.com/services/api
13/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Multilingual diversity as the source of knowledge
Flickrpedia = Flickr + Wikipedias
Flickrpedia is designed on an API in Ruby and over developmentframework Ruby on Rails (Thomas 2005, Thomas andHeinemeier-Hansson 2005). Users can make queries in Flickrwriting a tag specifying its natural language.
The system crawls the Wikipedia in the corresponding languageand look for an appropriate page. With the help of regularexpressions, Flickrpedia parses the web page and extracts theexisting language pairs of the same topic in other languages fromthe appropriate web page box.
How Flickrpedia works
AirplaneEnglish
German user
FlugzeugGerman
AvionFrench
Hegazkinbasque
enters the query in Flickrpedia
the systemcrawls
parsing with the help of regular expressions
...
the German user obtains the desidered photos from Flickr!
The web page box for “alternate languages” in WikipediaAn example: the German word ‘Flugzeug’
16/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Multilingual diversity as the source of knowledge
The results of the German word ‘Flugzeug’
At 2007, April, 11, Flickr finds less than 10,000 photos whileFlickrpedia more than 20,000 for the same query, giving a lot ofunexpected and relevant photos.
Don’t trust me: try by yourself!Word searched: ‘Flugzeug’, i.e. airplane in German
http://buffy.sciva.uninsubria.it/∼rl608838/search
18/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Flickrpedia until now
Flickrpedia should only store the wikipedias according to theexisting natural languages – actually, 85. Large and extemporaneusshared information repositories, like Flickr, can be managedthrough other semi-structured information repositories as thewikipedias.
Flickrpedia, if refined out of its actual prototypical phase, may helpusers with poor knowledge of major languages to retrieveinformation only through their lesser-used languages.
19/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Further direction of Flickrpedia
Flickrpedia is far from perfect: homographies are still unmanaged,even if wikipedias have disambiguating pages, and it is not clearwhich wikipedias to choose in order to optimize serendipity.
By now the parsed wikipedias are the biggest ones in terms of wikipages, but this doesn’t give any guarantee of serendipityaugmentation.
Finally, the API given by Flickr is a severe limit: up to 20 tags canbe inserted in a single query request, and up to 60 thumbnails maybe given.
20/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Beyond Flickrpedia
This approach isn’t limited to Flickr as the underlying folksonomy.Our research direction is towards generalization, i.e. users canchoose the appropriate folksonomy performing multilingual queries.
It is still to demonstrate how to apply this approach tofolksonomies where the semantic references are different fromphotos, i.e. an airplane or a flower is still so in almost every humanlanguage, more or less.
The real underlying problem is how to measure serendipity, i.e.specific and precise metrics for serendipity are needed.
21/21
Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks
Thank you. Any questions?
Download these slides at the following permalink:
http://purl.org/net/fgobbo
(cc) F. Gobbo 2007. Published in Italy.Attribuzione – Non commerciale – Condividi allo stesso modo 2.5