From Clouds to Trees: Clustering Delicious Tags
-
Upload
stefano-bussolon -
Category
Documents
-
view
2.013 -
download
1
description
Transcript of From Clouds to Trees: Clustering Delicious Tags
EuroIA Paris - September 2010 � 1 / 44
From louds to trees
Stefano Bussolon
September 28, 2010
Des ription of the work
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 2 / 44
Introdu tion
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 3 / 44
Deli ious is a so ial bookmarking servi e thatallows you to tag, save, manage and share Webpages all in one pla e.Users an save their bookmarks on it. A bookmark re ord has6 �elds: the URL of the resour e to save, it's title, an optionalnotes �eld, an optional tags �eld, and two others �eld forbookmarks sharing.Tags are optional but strongly suggested, be ause they makethe bookmarks easier to organize and navigate.
Tags o-o urren es
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 4 / 44
Every time an user employs more than one tag in saving abookmark, she impli itly states a link between them.The aim of this resear h is to understand if a statisti alanalysis of a signi� ant number of those o-o urren es anlet emerge meaningful lusters.Most spe i� ally, the approa h I followed was to olle t a orpus of deli ious bookmarks, to rank the most frequentlyused tags, to al ulate the o-o urren e between those tags,and to analyze the resulting matrix with some dimensionals aling methods to let the hidden, impli it stru tures toemerge.
Related work
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 5 / 44✔ Salonen (2007)
✔ Shepitsen et al. (2008)✔ Guo et al. (2009)✔ Begelman et al. (2006)✔ Zhou and King (2009)
Data Mining
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 6 / 44
Using a daemon, I have olle ted 120768 deli ious bookmarks.(14 De ember 2009 - 19 January 2010)✔ 96117 distin t links✔ 77242 distin t users✔ 369856 total tags✔ 59371 distin t tagsI sele ted the 500 tags more frequently used.
The most frequent tags
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 7 / 44
1) design 62982) tools 42713) webdesign 39394) blog 35355) inspiration 31376) software 28557) programming 2663...498) presentations 92499) study 92500) women 91
Tags frequen y
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 8 / 44
The frequen y distribution of the tags shows a typi al Zipf'slaw.0 100 200 300 400 500
010
0020
0030
0040
0050
0060
00
Figure 1: Tags frequen y
Co-o urren e matrix
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 9 / 44
I reated a o-o urren e matrix of the 500 most frequenttags.The o-o urren es matrix ounts the times two tags are usedin the same bookmark.Figure 2: Co-o urren e matrix
Items frequen y and varian e
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 10 / 44
A sub-matrix (200*500) of o-o urren es has been used to al ulate a Prin ipal Component Analysis.Prin ipal Components Analysis (PCA) is an exploratorymultivariate statisti al te hnique for simplifying omplex datasets Ray haudhuri et al. (2000), and the �rst eigenve tors anbe used to map the elements on the prin ipal omponents'spa e.The results of the PCA, however, were biased by thefrequen y distribution of the tags, be ause the most used tagsdid monopolize the varian e of the PCA.
Varimax and Fa tor Analysis
Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 11 / 44
I therefore de ided to use a Fa tor Analysis with Varimaxrotation instead.The Varimax rotation over ame the frequen y bias of thetags, leading to mu h more insightful results.The adoption of the Fa tor analysis as a multidimensionals aling statisti s lead to two main advantages:✔ thanks to the varimax rotation, it over omes thefrequen y distribution bias;✔ the resulting distribution of the dimensions of theloadings is unipolar, leading to an easier interpretation ofthe results.
The fa tors
Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 12 / 44
Fa tors 1 and 2
Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 13 / 44
0.0
0.2
0.4
0.6
0.8
design
tools
webdesignblog
inspiration
software
programming
webtutorial
reference
free
video
art
web2.0
photography
css
howto
javascript
development
music
resourceseducation
jquery
tips
business
linux
shopping
opensource
tutorials
technology
iphone
online
wordpresssocialmedia
mac
marketingphotoshopscience
blogs
social flash
food
news
games
internet
recipes
windows
research
security
portfolio
books
graphicsmedia
search
travel
typographyhumor
php
webdevfun
download
illustration
visualization
photo
mobile
historyculture
java
tool
photos
politics
diy
writing
facebookhealth
fashion
productivity
community
plugin
library
usabilityimages
funny
audio
learning
apple
recipe
cool
osx
statistics
html
collaboration
fonts
freeware
toreadruby
python seo
movies architecture
hardware
math
fic
tv
magazine
database
iconsajax
cooking
computer
socialnetworking
game
language
3d
apps
gallery
ui
ubuntu
data
code
management
performance
advertising
2009
youtube
list
shop
article
slash
home
rails
frameworkmaps
interactive
firefox
api
bookvideos
network
animation
blogging
trends
teaching
android generator
plugins
environment
englishfilm
browser
ebooks
finance
emailwiki
mp3
guide
aupsychologysystem:filetype:pdfsystem:media:document
analytics
ideas
utilities
ux
image
interface
reading
streaming
testing
tech
server
graphic
networking
economics
templates
website
microsoft
craftsthemes
freelance
electronics
jobs
workmysql
interestingfont
drupal
ping.fm
radiodigital
money resourcestartup
agency
backup
information
sysadmin
Fa tors 3 and 4
Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 14 / 44
0.0
0.2
0.4
0.6
0.8
design
toolswebdesign
bloginspiration
software
programming
web
tutorialreference
freevideo
art
web2.0
photography
css
howto
javascript
development
music
resourceseducation
jquery tips google
business
linux
twittershopping
opensource
tutorials
technology
iphone
online
wordpress
socialmedia
mac
marketing
photoshop
science blogs social
flash
food
news
games
internet
recipes
windowsresearch
security
portfolio
booksgraphics
media
search
traveltypographyhumor
php
webdev
fundownload
illustration
visualization
photo
mobile
history
culture
java
tool
photospolitics
diy
writing
healthfashion
productivity
community
plugin
library
usability
imagesfunny audio
learning
apple
recipe
coolosx
statistics
html
collaboration
fontsfreeware
toreadruby
python
seo
movies
architecturehardware
math
fictv
magazine
database
icons
ajax
cooking
computer
socialnetworking
game
language
3d
appsgallery
ui
ubuntu
data
code
management
performance
advertising
2009
youtube
list
shop
article
slashhome
rails
framework
maps
interactive
firefox
api
book
videos
networkanimation
blogging
trends
teaching
android
generator
plugins
environmentenglishfilm
browser
ebooks
finance
wiki
mp3
guide
aupsychology
system:filetype:pdfsystem:media:document
analytics
ideasutilities
ux
image
interface
readingstreaming
testing
tech
server
graphic
networking
economicstemplates
website
microsoft
craftsthemes
freelance
electronics
jobs
work
mysql
interesting
font
drupal
ping.fm
radiodigitalmoney
resource
startup
agencybackup
information
sysadmin
Fa tors 5 and 6
Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 15 / 44
0.0
0.2
0.4
0.6
0.8
designtools
webdesign
blog
inspiration
software
programmingwebtutorial
reference
free
video
artweb2.0photography
css
howto
javascriptdevelopment
music
resources
education
jquerytips
businesslinux twittershopping
opensourcetutorials
technology
iphone
online
wordpress
socialmediamacmarketing
photoshop
science
blogssocial
flash
food
news gamesinternet
recipeswindows researchsecurityportfolio
booksgraphics
media
searchtravel
typography
humor
phpwebdev
fun
download
illustration
visualizationphotomobile
history
culture
java
toolphotos
politics
diy
writingfacebookhealth
fashion
productivity
community
pluginlibrary
usability
images
funnyaudio
learning
apple
recipe
cool
osx statisticshtml
collaboration
fonts
freeware
toreadruby pythonseo
movies
architecturehardware math
fic
tv
magazine
databaseicons
ajaxcooking
computersocialnetworking
game
language
3d
appsgallery
ui
ubuntudata
codemanagement
performance
advertising
2009
youtube
listshop
articleslash
homerailsframework
maps
interactive
firefox api
book
videos
network
animation
blogging
trends teachingandroid generator
plugins
environmentenglish
film
browserebooksfinance
emailwiki
mp3
guideau
psychologysystem:filetype:pdfsystem:media:document
analytics
ideas
utilities
ux
image
interfacereading
streaming
testing
tech
server
graphicnetworkingeconomics
templates
website microsoftcrafts
themesfreelance
electronics
jobsworkmysql
interesting
font
drupal
ping.fm
radio
digital
money resource
startup
agencybackup information
sysadmin
Figure 3: Fa tors 5 and 6
Fa tors 7 and 8
Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 16 / 44
0.0
0.2
0.4
0.6
design
tools
webdesign
blog
inspiration
software
programming webtutorial
referencefree
video
art
web2.0
photography
css
howto
javascript
development
music
resources
education
jquery
tips
business
linux
shopping
opensource
tutorials
technology
iphone
online
wordpress
socialmedia
macmarketing
photoshop
science
blogs
social
flash
food
news
games
internet
recipeswindows
research
securityportfolio
books
graphicsmedia
searchtravel
typography
humor
php
webdev
fundownload
illustration
visualizationphoto
mobile
historyculture
java
tool
photospolitics
diy
writing
health
fashion
productivity
community
plugin
library usability
images
funnyaudiolearning
apple
recipe
coolosx
statistics
html
collaboration
fontsfreeware
toread
ruby
python
seo
movies
architecture
hardwaremathfictv
magazine
database
icons
ajax
cookingcomputer
socialnetworkinggamelanguage
3d
apps
gallery
ui
ubuntudata
code
management
performance
advertising
2009
youtube
list
shop
article
slash
home
rails
framework
maps
interactive
firefox
api
book
videos
network
animation
blogging
trends
teaching
android
generator
plugins
environment
english
film
browser
ebooksfinance
wikimp3guide
aupsychologysystem:filetype:pdfsystem:media:document
analyticsideas utilities
ux
image
interface
readingstreaming
testing
techserver
graphic
networkingeconomics
templates
website
microsoft
crafts
themes
freelance
electronics
jobs work
mysql
interesting
font
drupal
pdfping.fm
radio
digital money
resource
startup
agency backup
informationsysadmin
Figure 4: Fa tors 7 and 8
Fa tors 9 and 10
Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 17 / 44
−0.
20.
00.
20.
40.
6
design
tools
webdesign
blog
inspiration
software
programmingweb
tutorialreference
free
video art
web2.0
photography
css
howto
javascript
development
music
resources educationjquery
tips
business
linux
shopping
opensource
tutorials
technology
iphone
online
wordpresssocialmedia
mac
marketing
photoshop
science
blogssocialflash
food news
games internet
recipes
windows
research
security
portfolio books
graphics
media
search
travel
typography
humor
php
webdev
fun
download
illustrationvisualization
photo
mobile
historyculture
java
tool
photos
politics
diy
writingfacebook
health
fashion
productivity
communitypluginlibrary
usability
images
funny
audio
learning
apple
recipe
cool
osx
statistics
html
collaboration
fonts
freeware
toread
rubypython
seomoviesarchitecturehardwaremath
fictv
magazinedatabase
iconsajax
cookingcomputer
socialnetworkinggame
language
3d
apps
gallery
ui
ubuntudata
code
management
performance
advertising
2009
youtube
list
shop
article
slash
homerails
framework
maps
interactive
firefoxapi book
videosnetwork
animation
blogging
trendsteaching
android
generator
plugins
environment
english
film
browser
ebooks
finance
wikimp3
guide
au
psychology
system:filetype:pdfsystem:media:document
analytics
ideas
utilities
ux
image
interface
readingstreaming
testingtechserver
graphicnetworking
economicstemplates
website
microsoft
crafts
themes
freelance
electronics
jobs work
mysql
interesting
font
drupal
ping.fm
radio
digital
money
resource
startup
agency
backup
informationsysadmin
Figure 5: Fa tors 9 and 10
Fa tors 11 and 12
Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 18 / 44−
0.2
0.0
0.2
0.4
0.6
design
tools
webdesign
bloginspiration
software
programming
web
tutorial
reference
freevideo
art
web2.0
photography
css
howto
javascript
development
music
resources
education
jquery tips
businesslinux
twittershopping
opensource
tutorials
technology
iphone
online
wordpress
socialmedia
mac
marketing photoshop
science
blogs
social
flashfood
news
gamesinternet
recipes
windows
research
securityportfolio
books
graphics
mediasearch
travel
typography
humor
phpwebdevfun
download
illustration
visualization
photomobilehistory
culture
java
toolphotospolitics
diy
writing
healthfashion
productivity
community
plugin
library
usability
images
funny
audio
learning
apple
recipe
cool
osx
statistics
htmlcollaboration
fontsfreeware
toread
rubypython
seo
movies
architecture
hardwaremath
fic
tvmagazine
database
icons
ajaxcooking
computersocialnetworkinggame
language
3d
apps
gallery
ui
ubuntu
data
code
management
performance
advertising
2009
youtube
list
shop
article
slash
home
railsframework
maps
interactive
firefox
api
book
videos
network
animation
blogging
trends
teaching
android
generator
plugins
environment
englishfilm
browser
ebooks
financeemail
wiki
mp3
guide
aupsychology
system:filetype:pdfsystem:media:document
analyticsideas
utilities
ux
image
interface
reading
streaming
testingtech
server
graphicnetworking
economics
templates
website
microsoft
crafts
themes
freelance
electronics
jobswork
mysqlinteresting
font
drupal
ping.fm
radio
digitalmoney
resource
startup
agency
backupinformation
sysadmin
Figure 6: Fa tors 11 and 12
Clusters by loadings
Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 19 / 44
From a graphi visualization of the fa tors, it was lear thatthey were, overall, semanti ally onsistent.At �rst, I applied a k-means lustering algorithm to theloadings of the fa tor analysis. The results, however, seemedsub-optimal, given the strong indi ations arising from thefa tor analysis.I therefore de ided to luster the items to the fa tor with theminor rank.The algorithm is the following:✔ for every fa tor, order the items by it's loadings, and al ulate a rank.✔ for every item, assign it to the fa tor with the minimumrank.
The head and the tail of the lists
Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 20 / 44
Using this algorithm, the items will be ordered by the loadingson the fa tor. The �rst item will therefore be the mostrepresentative of the luster. This is a good side e�e t,be ause this would in rease the information s ent of the ategories.The items at the tail of the list, however, often do not �t verywell with the rest of the ategory. For the sake of simpli itythey were nonetheless in luded in those luster.This solution, however, is too simplisti . It would be more orre t to try to apply some di�erent solution.
The lusters
Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 21 / 44
Clusters 1-3
Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 22 / 44
✔ graphi portfolio inspiration agen y illustration graphi sart ar hite ture typography blog interfa e webdesignideas home ux shopping shop✔ windows utilities freeware opensour e ba kup linux omputer ma se urity network apps osx sysadminsoftware mi rosoft produ tivity server apple hardwareubuntu management✔ so ialnetworking fa ebook twitter so ial so ialmediamarketing trends media ommunity blogging businessinternet ollaboration networking email advertisingstartup work jobs
Clusters 4-6
Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 23 / 44
✔ java python ode php api development framework libraryprogramming toread performan e database ruby iphonerails android mobile
✔ tea hing learning math s ien e resear h edu ationreading te hnology english language writing intera tivewiki games data✔ youtube tv videos streaming movies �lm animationping.fm funny video humor �ash game
Clusters 7-9
Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 24 / 44
✔ generator browser tool testing seo sear h analyti swebsite google pdf web2.0 online usability statisti s web�refox
✔ ajax plugin jquery plugins webdev javas ript ss html uidesign
✔ ulture news politi s interesting blogs te h historymagazine arti le e onomi s psy hology health informationenvironment food �nan e money ooking re ipe re ipes
Clusters 10-12
Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 25 / 44
✔ photos photo images image photography travel digitalgallery fun ool fashion 3d maps rafts visualization✔ guide tutorials tips howto photoshop drupal tutorial diyresour e wordpress templates referen e resour es listthemes mysql ele troni s 2009 freelan e✔ download mp3 radio audio ebooks i ons musi booksbook tools fonts font free
Frequen y and rank
Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 26 / 44
Interestingly, the most used tags were not the mostrepresentative of their luster.Design ranked 15th on his luster, tools 14th, webdesign 14th,blog 10th, software 14th. Only inspiration got a signi� ant3th pla e on his group.We ould hypothesize that the most used tags did not polarizetheir lusters be ause they are too general, too broad. In thetherms of the theory of Ros h et al. (1976) they are not atthe basi level, but at a superordinate level.
Travels and frameworks
Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 27 / 44
Among others, two tags have a parti ular status: travel andframework. Both these tags did get a good rank in twodistin t fa tors.Travel was 6th on Fa tor 10 (photos, images ...) and 7th onfa tor 9 ( ulture, news, politi s).Framework was 7th on Fa tor 4 (java, python ...) and 8th onfa tor 8 (ajax, plugin ... ss ...).A possible di�eren e:✔ for framework, a java framework and a ss framework aretwo di�erent things, that share just a family resemblan eWittgenstein (1953);✔ for travel, photos and ulture are two distin tdimensions of the same user experien e.
Appli ations of the method
Des ription of theworkThe fa torsThe lustersAppli ations of themethodA Deli ioustaxonomyA method togenerate ataxonomy fromtagsMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 28 / 44
A Deli ious taxonomy
Des ription of theworkThe fa torsThe lustersAppli ations of themethodA Deli ioustaxonomyA method togenerate ataxonomy fromtagsMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 29 / 44
Example: the most popular links, grouped by ategories.Another possibility: a dire tory of ategories
Figure 7: see http://www.hyperlabs.net/tree/
A method to generate a taxonomy fromtagsDes ription of theworkThe fa torsThe lustersAppli ations of themethodA Deli ioustaxonomyA method togenerate ataxonomy fromtagsMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 30 / 44
This methodology an be applied to every so ial taggingsystem: for example Fli kr.It is appli able to your site, if it uses tags, folksonomies or ontrolled vo abulary.
Methodologi al generalization
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationFirst abstra tions: ard sorting, tags
→ matrixSe ondgeneralization:matrix → graphTheoreti al resultsReferen esEuroIA Paris - September 2010 � 31 / 44
First abstra tions: ard sorting, tags →matrixDes ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationFirst abstra tions: ard sorting, tags
→ matrixSe ondgeneralization:matrix → graphTheoreti al resultsReferen esEuroIA Paris - September 2010 � 32 / 44
The generation of the o-o urren e matrix is very similar tothe matrix generated by the analysis of a ard sorting.The main di�eren e: the items frequen y bias in thefolksonomy.Therefore, the methods that an be applied to the ard sorting� see, for example, Bussolon (2009) � an also be applied to atagging system, and vi e-versa. Statisti re-usability.
Se ond generalization: matrix → graph
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationFirst abstra tions: ard sorting, tags
→ matrixSe ondgeneralization:matrix → graphTheoreti al resultsReferen esEuroIA Paris - September 2010 � 33 / 44
The o-o urren e matri es an be seen and represented asgraph.There is an important resear h �eld devoted to graph lustering. Can it be appli able to ard sorting orfolksonomies?
Theoreti al results
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 34 / 44
Dis over of the impli it dimensions
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 35 / 44
PCA on Card sorting � Fa tor analysis on folksonomies.In both ases, the impli it dimensions of the mental models ofthe users do emerge.
Graphs and multi-dimensional spa es
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 36 / 44
From a te hnologi al point of view, the Internet is a dire tedgraph. From a ognitive point of view, however, this graph isuseless, be ause it an not be traversed by an human.The Internet graph is information, but � as is � is notknowledge.The multi-dimensional spa e is a better approximation of the ognitive models of the users.
Clusters, dimensions and onjun tions
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 37 / 44
The multi-dimensional spa e an be redu ed in lusters for themost typi al elements. Example: Paris and Tour Ei�el an beredu ed in a luster.When we are looking for elements that are a onjun tion ofmore than one feature, however, the multidimensional is abetter representation.Example: Argentine tango in Paris, is the onjun tion of twothings usually belonging to two distin t lusters: Paris andArgentine tango. A multi-dimensional spa e an help us to letemerge resour es that are salient for both the relevantdimensions, allowing the representation and the retrieval ofthe onjun tive elements.
The dynami nature of the multispa e
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 38 / 44
The multidimensional spa e is dynami : the dimensions hange: some new dimensions an arise, others get lessattention.This dynami is oherent with the theory of ad ho ategoriesby Barsalou (1983).The salien e of the dimensions dynami ally hanges everytime the goals of the users hanges.The multispa e emerged by my analysis is the resultant ve torof the multispa es of the 77000 users who saved thebookmarks of my deli ious orpus.
Bookmarks as pheromones
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 39 / 44
While moving in the environment, inse ts do releasepheromones, parti ular hemi al omponents that work asturning signals for the other onspe i� s.We an see pheromones as a sort of augmented reality for theinse ts, a form of augmented ognitionChandrasekharan and Stewart (2007).Deli ious users leave a tra k every time they save as publi abookmark. If we an map those tra ks of digital pheromones,we an get � and give to other users � an augmented map ofthe ognitive multispa e of the subset of resour es tagged ondeli ious.
Tempus fugit ... questions?
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 40 / 44
Stefano Bussolonwww.bussolon.itstefano�bussolon.it�sweetdreameritlinkedinfa ebook
Referen es
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esReferen es (1)Referen es (2)Referen es (3)EuroIA Paris - September 2010 � 41 / 44
Referen es (1)
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esReferen es (1)Referen es (2)Referen es (3)EuroIA Paris - September 2010 � 42 / 44
✔ Barsalou, L. W. (1983). Ad ho ategories. Memory andCognition, 11(3):211�227✔ Begelman, G., Keller, P., and Smadja, F. (2006).Automated Tag Clustering: Improving sear h andexploration in the tag spa e. In WWW 2006. WWW2006
✔ Bussolon, S. (2009). Card sorting, ategory validity, and ontextual navigation. Journal of InformationAr hite ture, 1(2):16�41✔ Chandrasekharan, S. and Stewart, T. (2007). The originof epistemi stru tures and proto-representations.Adaptive Behavior-Animals, Animats, Software Agents,Robots, Adaptive Systems, 15(3):329�353
Referen es (2)
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esReferen es (1)Referen es (2)Referen es (3)EuroIA Paris - September 2010 � 43 / 44
✔ Guo, L., Ja ob, E., and George, N. (2009). Longitudinalanalysis of tag stru ture in del.i io.us✔ Ray haudhuri, S., Stuart, J. M., and Altman, R. B.(2000). Prin ipal omponents analysis to summarizemi roarray experiments: appli ation to sporulation timeseries. Pa i� Symposium on Bio omputing, 5:452�463
✔ Ros h, E., Mervis, C. B., Gray, W. D., Johnson, D. M.,and Boyes Braem, P. (1976). Basi obje ts in natural ategories. Cognitive Psy hology, 8:382�439
✔ Salonen, J. (2007). Self-organising map based tag louds. In Pro eedings of 1st OPAALS workshop
Referen es (3)
Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esReferen es (1)Referen es (2)Referen es (3)EuroIA Paris - September 2010 � 44 / 44
✔ Shepitsen, A., Gemmell, J., Mobasher, B., and Burke, R.(2008). Personalized re ommendation in so ial taggingsystems using hierar hi al lustering. In Pro eedings ofthe 2008 ACM onferen e on Re ommender systems,pages 259�266. ACM✔ Wittgenstein, L. (1953). Philosophis he Untersu hungen.Bla kwell, Oxford. English translation Philosophi alInvestigations; trans. and ed. by G.E.M. Ans ombe;se ond edition 1958✔ Zhou, T. and King, I. (2009). Automobile, ar andBMW: horizontal and hierar hi al approa h in so ialtagging systems. In Pro eeding of the 2nd ACMworkshop on So ial web sear h and mining, pages 25�32.ACM