From Clouds to Trees: Clustering Delicious Tags

44

description

 

Transcript of From Clouds to Trees: Clustering Delicious Tags

Page 1: From Clouds to Trees: Clustering Delicious Tags

EuroIA Paris - September 2010 � 1 / 44

From louds to trees

Stefano Bussolon

September 28, 2010

Page 2: From Clouds to Trees: Clustering Delicious Tags

Des ription of the work

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 2 / 44

Page 3: From Clouds to Trees: Clustering Delicious Tags

Introdu tion

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 3 / 44

Deli ious is a so ial bookmarking servi e thatallows you to tag, save, manage and share Webpages all in one pla e.Users an save their bookmarks on it. A bookmark re ord has6 �elds: the URL of the resour e to save, it's title, an optionalnotes �eld, an optional tags �eld, and two others �eld forbookmarks sharing.Tags are optional but strongly suggested, be ause they makethe bookmarks easier to organize and navigate.

Page 4: From Clouds to Trees: Clustering Delicious Tags

Tags o-o urren es

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 4 / 44

Every time an user employs more than one tag in saving abookmark, she impli itly states a link between them.The aim of this resear h is to understand if a statisti alanalysis of a signi� ant number of those o-o urren es anlet emerge meaningful lusters.Most spe i� ally, the approa h I followed was to olle t a orpus of deli ious bookmarks, to rank the most frequentlyused tags, to al ulate the o-o urren e between those tags,and to analyze the resulting matrix with some dimensionals aling methods to let the hidden, impli it stru tures toemerge.

Page 5: From Clouds to Trees: Clustering Delicious Tags

Related work

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 5 / 44✔ Salonen (2007)

✔ Shepitsen et al. (2008)✔ Guo et al. (2009)✔ Begelman et al. (2006)✔ Zhou and King (2009)

Page 6: From Clouds to Trees: Clustering Delicious Tags

Data Mining

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 6 / 44

Using a daemon, I have olle ted 120768 deli ious bookmarks.(14 De ember 2009 - 19 January 2010)✔ 96117 distin t links✔ 77242 distin t users✔ 369856 total tags✔ 59371 distin t tagsI sele ted the 500 tags more frequently used.

Page 7: From Clouds to Trees: Clustering Delicious Tags

The most frequent tags

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 7 / 44

1) design 62982) tools 42713) webdesign 39394) blog 35355) inspiration 31376) software 28557) programming 2663...498) presentations 92499) study 92500) women 91

Page 8: From Clouds to Trees: Clustering Delicious Tags

Tags frequen y

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 8 / 44

The frequen y distribution of the tags shows a typi al Zipf'slaw.0 100 200 300 400 500

010

0020

0030

0040

0050

0060

00

Figure 1: Tags frequen y

Page 9: From Clouds to Trees: Clustering Delicious Tags

Co-o urren e matrix

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 9 / 44

I reated a o-o urren e matrix of the 500 most frequenttags.The o-o urren es matrix ounts the times two tags are usedin the same bookmark.Figure 2: Co-o urren e matrix

Page 10: From Clouds to Trees: Clustering Delicious Tags

Items frequen y and varian e

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 10 / 44

A sub-matrix (200*500) of o-o urren es has been used to al ulate a Prin ipal Component Analysis.Prin ipal Components Analysis (PCA) is an exploratorymultivariate statisti al te hnique for simplifying omplex datasets Ray haudhuri et al. (2000), and the �rst eigenve tors anbe used to map the elements on the prin ipal omponents'spa e.The results of the PCA, however, were biased by thefrequen y distribution of the tags, be ause the most used tagsdid monopolize the varian e of the PCA.

Page 11: From Clouds to Trees: Clustering Delicious Tags

Varimax and Fa tor Analysis

Des ription of theworkIntrodu tionTags o-o urren esRelated workData MiningThe most frequenttagsTags frequen yCo-o urren ematrixItems frequen yand varian eVarimax andFa tor AnalysisThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen es EuroIA Paris - September 2010 � 11 / 44

I therefore de ided to use a Fa tor Analysis with Varimaxrotation instead.The Varimax rotation over ame the frequen y bias of thetags, leading to mu h more insightful results.The adoption of the Fa tor analysis as a multidimensionals aling statisti s lead to two main advantages:✔ thanks to the varimax rotation, it over omes thefrequen y distribution bias;✔ the resulting distribution of the dimensions of theloadings is unipolar, leading to an easier interpretation ofthe results.

Page 12: From Clouds to Trees: Clustering Delicious Tags

The fa tors

Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 12 / 44

Page 13: From Clouds to Trees: Clustering Delicious Tags

Fa tors 1 and 2

Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 13 / 44

0.0

0.2

0.4

0.6

0.8

design

tools

webdesignblog

inspiration

software

programming

webtutorial

reference

free

video

art

web2.0

photography

css

howto

javascript

development

music

resourceseducation

jquery

tips

google

business

linux

twitter

shopping

opensource

tutorials

technology

iphone

online

wordpresssocialmedia

mac

marketingphotoshopscience

blogs

social flash

food

news

games

internet

recipes

windows

research

security

portfolio

books

graphicsmedia

search

travel

typographyhumor

php

webdevfun

download

illustration

visualization

photo

mobile

historyculture

java

tool

photos

politics

diy

writing

facebookhealth

fashion

productivity

community

plugin

library

usabilityimages

funny

audio

learning

apple

recipe

cool

osx

statistics

html

collaboration

fonts

freeware

toreadruby

python seo

movies architecture

hardware

math

fic

tv

magazine

database

iconsajax

cooking

computer

socialnetworking

game

language

3d

apps

gallery

ui

ubuntu

data

code

management

performance

advertising

2009

youtube

list

shop

article

slash

home

rails

frameworkmaps

interactive

firefox

api

bookvideos

network

animation

blogging

trends

teaching

android generator

plugins

environment

englishfilm

browser

ebooks

finance

emailwiki

mp3

guide

aupsychologysystem:filetype:pdfsystem:media:document

analytics

ideas

utilities

ux

image

interface

reading

streaming

testing

tech

server

graphic

networking

economics

templates

website

microsoft

craftsthemes

freelance

electronics

jobs

workmysql

interestingfont

drupal

pdf

ping.fm

radiodigital

money resourcestartup

agency

backup

information

sysadmin

Page 14: From Clouds to Trees: Clustering Delicious Tags

Fa tors 3 and 4

Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 14 / 44

0.0

0.2

0.4

0.6

0.8

design

toolswebdesign

bloginspiration

software

programming

web

tutorialreference

freevideo

art

web2.0

photography

css

howto

javascript

development

music

resourceseducation

jquery tips google

business

linux

twittershopping

opensource

tutorials

technology

iphone

online

wordpress

socialmedia

mac

marketing

photoshop

science blogs social

flash

food

news

games

internet

recipes

windowsresearch

security

portfolio

booksgraphics

media

search

traveltypographyhumor

php

webdev

fundownload

illustration

visualization

photo

mobile

history

culture

java

tool

photospolitics

diy

writing

facebook

healthfashion

productivity

community

plugin

library

usability

imagesfunny audio

learning

apple

recipe

coolosx

statistics

html

collaboration

fontsfreeware

toreadruby

python

seo

movies

architecturehardware

math

fictv

magazine

database

icons

ajax

cooking

computer

socialnetworking

game

language

3d

appsgallery

ui

ubuntu

data

code

management

performance

advertising

2009

youtube

list

shop

article

slashhome

rails

framework

maps

interactive

firefox

api

book

videos

networkanimation

blogging

trends

teaching

android

generator

plugins

environmentenglishfilm

browser

ebooks

finance

email

wiki

mp3

guide

aupsychology

system:filetype:pdfsystem:media:document

analytics

ideasutilities

ux

image

interface

readingstreaming

testing

tech

server

graphic

networking

economicstemplates

website

microsoft

craftsthemes

freelance

electronics

jobs

work

mysql

interesting

font

drupal

pdf

ping.fm

radiodigitalmoney

resource

startup

agencybackup

information

sysadmin

Page 15: From Clouds to Trees: Clustering Delicious Tags

Fa tors 5 and 6

Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 15 / 44

0.0

0.2

0.4

0.6

0.8

designtools

webdesign

blog

inspiration

software

programmingwebtutorial

reference

free

video

artweb2.0photography

css

howto

javascriptdevelopment

music

resources

education

jquerytips

google

businesslinux twittershopping

opensourcetutorials

technology

iphone

online

wordpress

socialmediamacmarketing

photoshop

science

blogssocial

flash

food

news gamesinternet

recipeswindows researchsecurityportfolio

booksgraphics

media

searchtravel

typography

humor

phpwebdev

fun

download

illustration

visualizationphotomobile

history

culture

java

toolphotos

politics

diy

writingfacebookhealth

fashion

productivity

community

pluginlibrary

usability

images

funnyaudio

learning

apple

recipe

cool

osx statisticshtml

collaboration

fonts

freeware

toreadruby pythonseo

movies

architecturehardware math

fic

tv

magazine

databaseicons

ajaxcooking

computersocialnetworking

game

language

3d

appsgallery

ui

ubuntudata

codemanagement

performance

advertising

2009

youtube

listshop

articleslash

homerailsframework

maps

interactive

firefox api

book

videos

network

animation

blogging

trends teachingandroid generator

plugins

environmentenglish

film

browserebooksfinance

emailwiki

mp3

guideau

psychologysystem:filetype:pdfsystem:media:document

analytics

ideas

utilities

ux

image

interfacereading

streaming

testing

tech

server

graphicnetworkingeconomics

templates

website microsoftcrafts

themesfreelance

electronics

jobsworkmysql

interesting

font

drupal

pdf

ping.fm

radio

digital

money resource

startup

agencybackup information

sysadmin

Figure 3: Fa tors 5 and 6

Page 16: From Clouds to Trees: Clustering Delicious Tags

Fa tors 7 and 8

Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 16 / 44

0.0

0.2

0.4

0.6

design

tools

webdesign

blog

inspiration

software

programming webtutorial

referencefree

video

art

web2.0

photography

css

howto

javascript

development

music

resources

education

jquery

tips

google

business

linux

twitter

shopping

opensource

tutorials

technology

iphone

online

wordpress

socialmedia

macmarketing

photoshop

science

blogs

social

flash

food

news

games

internet

recipeswindows

research

securityportfolio

books

graphicsmedia

searchtravel

typography

humor

php

webdev

fundownload

illustration

visualizationphoto

mobile

historyculture

java

tool

photospolitics

diy

writing

facebook

health

fashion

productivity

community

plugin

library usability

images

funnyaudiolearning

apple

recipe

coolosx

statistics

html

collaboration

fontsfreeware

toread

ruby

python

seo

movies

architecture

hardwaremathfictv

magazine

database

icons

ajax

cookingcomputer

socialnetworkinggamelanguage

3d

apps

gallery

ui

ubuntudata

code

management

performance

advertising

2009

youtube

list

shop

article

slash

home

rails

framework

maps

interactive

firefox

api

book

videos

network

animation

blogging

trends

teaching

android

generator

plugins

environment

english

film

browser

ebooksfinance

email

wikimp3guide

aupsychologysystem:filetype:pdfsystem:media:document

analyticsideas utilities

ux

image

interface

readingstreaming

testing

techserver

graphic

networkingeconomics

templates

website

microsoft

crafts

themes

freelance

electronics

jobs work

mysql

interesting

font

drupal

pdfping.fm

radio

digital money

resource

startup

agency backup

informationsysadmin

Figure 4: Fa tors 7 and 8

Page 17: From Clouds to Trees: Clustering Delicious Tags

Fa tors 9 and 10

Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 17 / 44

−0.

20.

00.

20.

40.

6

design

tools

webdesign

blog

inspiration

software

programmingweb

tutorialreference

free

video art

web2.0

photography

css

howto

javascript

development

music

resources educationjquery

tips

google

business

linux

twitter

shopping

opensource

tutorials

technology

iphone

online

wordpresssocialmedia

mac

marketing

photoshop

science

blogssocialflash

food news

games internet

recipes

windows

research

security

portfolio books

graphics

media

search

travel

typography

humor

php

webdev

fun

download

illustrationvisualization

photo

mobile

historyculture

java

tool

photos

politics

diy

writingfacebook

health

fashion

productivity

communitypluginlibrary

usability

images

funny

audio

learning

apple

recipe

cool

osx

statistics

html

collaboration

fonts

freeware

toread

rubypython

seomoviesarchitecturehardwaremath

fictv

magazinedatabase

iconsajax

cookingcomputer

socialnetworkinggame

language

3d

apps

gallery

ui

ubuntudata

code

management

performance

advertising

2009

youtube

list

shop

article

slash

homerails

framework

maps

interactive

firefoxapi book

videosnetwork

animation

blogging

trendsteaching

android

generator

plugins

environment

english

film

browser

ebooks

finance

email

wikimp3

guide

au

psychology

system:filetype:pdfsystem:media:document

analytics

ideas

utilities

ux

image

interface

readingstreaming

testingtechserver

graphicnetworking

economicstemplates

website

microsoft

crafts

themes

freelance

electronics

jobs work

mysql

interesting

font

drupal

pdf

ping.fm

radio

digital

money

resource

startup

agency

backup

informationsysadmin

Figure 5: Fa tors 9 and 10

Page 18: From Clouds to Trees: Clustering Delicious Tags

Fa tors 11 and 12

Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 18 / 44−

0.2

0.0

0.2

0.4

0.6

design

tools

webdesign

bloginspiration

software

programming

web

tutorial

reference

freevideo

art

web2.0

photography

css

howto

javascript

development

music

resources

education

jquery tips

google

businesslinux

twittershopping

opensource

tutorials

technology

iphone

online

wordpress

socialmedia

mac

marketing photoshop

science

blogs

social

flashfood

news

gamesinternet

recipes

windows

research

securityportfolio

books

graphics

mediasearch

travel

typography

humor

phpwebdevfun

download

illustration

visualization

photomobilehistory

culture

java

toolphotospolitics

diy

writing

facebook

healthfashion

productivity

community

plugin

library

usability

images

funny

audio

learning

apple

recipe

cool

osx

statistics

htmlcollaboration

fontsfreeware

toread

rubypython

seo

movies

architecture

hardwaremath

fic

tvmagazine

database

icons

ajaxcooking

computersocialnetworkinggame

language

3d

apps

gallery

ui

ubuntu

data

code

management

performance

advertising

2009

youtube

list

shop

article

slash

home

railsframework

maps

interactive

firefox

api

book

videos

network

animation

blogging

trends

teaching

android

generator

plugins

environment

englishfilm

browser

ebooks

financeemail

wiki

mp3

guide

aupsychology

system:filetype:pdfsystem:media:document

analyticsideas

utilities

ux

image

interface

reading

streaming

testingtech

server

graphicnetworking

economics

templates

website

microsoft

crafts

themes

freelance

electronics

jobswork

mysqlinteresting

font

drupal

pdf

ping.fm

radio

digitalmoney

resource

startup

agency

backupinformation

sysadmin

Figure 6: Fa tors 11 and 12

Page 19: From Clouds to Trees: Clustering Delicious Tags

Clusters by loadings

Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 19 / 44

From a graphi visualization of the fa tors, it was lear thatthey were, overall, semanti ally onsistent.At �rst, I applied a k-means lustering algorithm to theloadings of the fa tor analysis. The results, however, seemedsub-optimal, given the strong indi ations arising from thefa tor analysis.I therefore de ided to luster the items to the fa tor with theminor rank.The algorithm is the following:✔ for every fa tor, order the items by it's loadings, and al ulate a rank.✔ for every item, assign it to the fa tor with the minimumrank.

Page 20: From Clouds to Trees: Clustering Delicious Tags

The head and the tail of the lists

Des ription of theworkThe fa torsFa tors 1 and 2Fa tors 3 and 4Fa tors 5 and 6Fa tors 7 and 8Fa tors 9 and 10Fa tors 11 and 12Clusters byloadingsThe head and thetail of the listsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 20 / 44

Using this algorithm, the items will be ordered by the loadingson the fa tor. The �rst item will therefore be the mostrepresentative of the luster. This is a good side e�e t,be ause this would in rease the information s ent of the ategories.The items at the tail of the list, however, often do not �t verywell with the rest of the ategory. For the sake of simpli itythey were nonetheless in luded in those luster.This solution, however, is too simplisti . It would be more orre t to try to apply some di�erent solution.

Page 21: From Clouds to Trees: Clustering Delicious Tags

The lusters

Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 21 / 44

Page 22: From Clouds to Trees: Clustering Delicious Tags

Clusters 1-3

Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 22 / 44

✔ graphi portfolio inspiration agen y illustration graphi sart ar hite ture typography blog interfa e webdesignideas home ux shopping shop✔ windows utilities freeware opensour e ba kup linux omputer ma se urity network apps osx sysadminsoftware mi rosoft produ tivity server apple hardwareubuntu management✔ so ialnetworking fa ebook twitter so ial so ialmediamarketing trends media ommunity blogging businessinternet ollaboration networking email advertisingstartup work jobs

Page 23: From Clouds to Trees: Clustering Delicious Tags

Clusters 4-6

Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 23 / 44

✔ java python ode php api development framework libraryprogramming toread performan e database ruby iphonerails android mobile

✔ tea hing learning math s ien e resear h edu ationreading te hnology english language writing intera tivewiki games data✔ youtube tv videos streaming movies �lm animationping.fm funny video humor �ash game

Page 24: From Clouds to Trees: Clustering Delicious Tags

Clusters 7-9

Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 24 / 44

✔ generator browser tool testing seo sear h analyti swebsite google pdf web2.0 online usability statisti s web�refox

✔ ajax plugin jquery plugins webdev javas ript ss html uidesign

✔ ulture news politi s interesting blogs te h historymagazine arti le e onomi s psy hology health informationenvironment food �nan e money ooking re ipe re ipes

Page 25: From Clouds to Trees: Clustering Delicious Tags

Clusters 10-12

Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 25 / 44

✔ photos photo images image photography travel digitalgallery fun ool fashion 3d maps rafts visualization✔ guide tutorials tips howto photoshop drupal tutorial diyresour e wordpress templates referen e resour es listthemes mysql ele troni s 2009 freelan e✔ download mp3 radio audio ebooks i ons musi booksbook tools fonts font free

Page 26: From Clouds to Trees: Clustering Delicious Tags

Frequen y and rank

Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 26 / 44

Interestingly, the most used tags were not the mostrepresentative of their luster.Design ranked 15th on his luster, tools 14th, webdesign 14th,blog 10th, software 14th. Only inspiration got a signi� ant3th pla e on his group.We ould hypothesize that the most used tags did not polarizetheir lusters be ause they are too general, too broad. In thetherms of the theory of Ros h et al. (1976) they are not atthe basi level, but at a superordinate level.

Page 27: From Clouds to Trees: Clustering Delicious Tags

Travels and frameworks

Des ription of theworkThe fa torsThe lustersClusters 1-3Clusters 4-6Clusters 7-9Clusters 10-12Frequen y andrankTravels andframeworksAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 27 / 44

Among others, two tags have a parti ular status: travel andframework. Both these tags did get a good rank in twodistin t fa tors.Travel was 6th on Fa tor 10 (photos, images ...) and 7th onfa tor 9 ( ulture, news, politi s).Framework was 7th on Fa tor 4 (java, python ...) and 8th onfa tor 8 (ajax, plugin ... ss ...).A possible di�eren e:✔ for framework, a java framework and a ss framework aretwo di�erent things, that share just a family resemblan eWittgenstein (1953);✔ for travel, photos and ulture are two distin tdimensions of the same user experien e.

Page 28: From Clouds to Trees: Clustering Delicious Tags

Appli ations of the method

Des ription of theworkThe fa torsThe lustersAppli ations of themethodA Deli ioustaxonomyA method togenerate ataxonomy fromtagsMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 28 / 44

Page 29: From Clouds to Trees: Clustering Delicious Tags

A Deli ious taxonomy

Des ription of theworkThe fa torsThe lustersAppli ations of themethodA Deli ioustaxonomyA method togenerate ataxonomy fromtagsMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 29 / 44

Example: the most popular links, grouped by ategories.Another possibility: a dire tory of ategories

Figure 7: see http://www.hyperlabs.net/tree/

Page 30: From Clouds to Trees: Clustering Delicious Tags

A method to generate a taxonomy fromtagsDes ription of theworkThe fa torsThe lustersAppli ations of themethodA Deli ioustaxonomyA method togenerate ataxonomy fromtagsMethodologi algeneralizationTheoreti al resultsReferen esEuroIA Paris - September 2010 � 30 / 44

This methodology an be applied to every so ial taggingsystem: for example Fli kr.It is appli able to your site, if it uses tags, folksonomies or ontrolled vo abulary.

Page 31: From Clouds to Trees: Clustering Delicious Tags

Methodologi al generalization

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationFirst abstra tions: ard sorting, tags

→ matrixSe ondgeneralization:matrix → graphTheoreti al resultsReferen esEuroIA Paris - September 2010 � 31 / 44

Page 32: From Clouds to Trees: Clustering Delicious Tags

First abstra tions: ard sorting, tags →matrixDes ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationFirst abstra tions: ard sorting, tags

→ matrixSe ondgeneralization:matrix → graphTheoreti al resultsReferen esEuroIA Paris - September 2010 � 32 / 44

The generation of the o-o urren e matrix is very similar tothe matrix generated by the analysis of a ard sorting.The main di�eren e: the items frequen y bias in thefolksonomy.Therefore, the methods that an be applied to the ard sorting� see, for example, Bussolon (2009) � an also be applied to atagging system, and vi e-versa. Statisti re-usability.

Page 33: From Clouds to Trees: Clustering Delicious Tags

Se ond generalization: matrix → graph

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationFirst abstra tions: ard sorting, tags

→ matrixSe ondgeneralization:matrix → graphTheoreti al resultsReferen esEuroIA Paris - September 2010 � 33 / 44

The o-o urren e matri es an be seen and represented asgraph.There is an important resear h �eld devoted to graph lustering. Can it be appli able to ard sorting orfolksonomies?

Page 34: From Clouds to Trees: Clustering Delicious Tags

Theoreti al results

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 34 / 44

Page 35: From Clouds to Trees: Clustering Delicious Tags

Dis over of the impli it dimensions

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 35 / 44

PCA on Card sorting � Fa tor analysis on folksonomies.In both ases, the impli it dimensions of the mental models ofthe users do emerge.

Page 36: From Clouds to Trees: Clustering Delicious Tags

Graphs and multi-dimensional spa es

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 36 / 44

From a te hnologi al point of view, the Internet is a dire tedgraph. From a ognitive point of view, however, this graph isuseless, be ause it an not be traversed by an human.The Internet graph is information, but � as is � is notknowledge.The multi-dimensional spa e is a better approximation of the ognitive models of the users.

Page 37: From Clouds to Trees: Clustering Delicious Tags

Clusters, dimensions and onjun tions

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 37 / 44

The multi-dimensional spa e an be redu ed in lusters for themost typi al elements. Example: Paris and Tour Ei�el an beredu ed in a luster.When we are looking for elements that are a onjun tion ofmore than one feature, however, the multidimensional is abetter representation.Example: Argentine tango in Paris, is the onjun tion of twothings usually belonging to two distin t lusters: Paris andArgentine tango. A multi-dimensional spa e an help us to letemerge resour es that are salient for both the relevantdimensions, allowing the representation and the retrieval ofthe onjun tive elements.

Page 38: From Clouds to Trees: Clustering Delicious Tags

The dynami nature of the multispa e

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 38 / 44

The multidimensional spa e is dynami : the dimensions hange: some new dimensions an arise, others get lessattention.This dynami is oherent with the theory of ad ho ategoriesby Barsalou (1983).The salien e of the dimensions dynami ally hanges everytime the goals of the users hanges.The multispa e emerged by my analysis is the resultant ve torof the multispa es of the 77000 users who saved thebookmarks of my deli ious orpus.

Page 39: From Clouds to Trees: Clustering Delicious Tags

Bookmarks as pheromones

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 39 / 44

While moving in the environment, inse ts do releasepheromones, parti ular hemi al omponents that work asturning signals for the other onspe i� s.We an see pheromones as a sort of augmented reality for theinse ts, a form of augmented ognitionChandrasekharan and Stewart (2007).Deli ious users leave a tra k every time they save as publi abookmark. If we an map those tra ks of digital pheromones,we an get � and give to other users � an augmented map ofthe ognitive multispa e of the subset of resour es tagged ondeli ious.

Page 40: From Clouds to Trees: Clustering Delicious Tags

Tempus fugit ... questions?

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsDis over of theimpli it dimensionsGraphs andmulti-dimensionalspa esClusters,dimensions and onjun tionsThe dynami nature of themultispa eBookmarks aspheromonesTempus fugit ...questions?Referen es EuroIA Paris - September 2010 � 40 / 44

Stefano Bussolonwww.bussolon.itstefano�bussolon.it�sweetdreameritlinkedinfa ebook

Page 41: From Clouds to Trees: Clustering Delicious Tags

Referen es

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esReferen es (1)Referen es (2)Referen es (3)EuroIA Paris - September 2010 � 41 / 44

Page 42: From Clouds to Trees: Clustering Delicious Tags

Referen es (1)

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esReferen es (1)Referen es (2)Referen es (3)EuroIA Paris - September 2010 � 42 / 44

✔ Barsalou, L. W. (1983). Ad ho ategories. Memory andCognition, 11(3):211�227✔ Begelman, G., Keller, P., and Smadja, F. (2006).Automated Tag Clustering: Improving sear h andexploration in the tag spa e. In WWW 2006. WWW2006

✔ Bussolon, S. (2009). Card sorting, ategory validity, and ontextual navigation. Journal of InformationAr hite ture, 1(2):16�41✔ Chandrasekharan, S. and Stewart, T. (2007). The originof epistemi stru tures and proto-representations.Adaptive Behavior-Animals, Animats, Software Agents,Robots, Adaptive Systems, 15(3):329�353

Page 43: From Clouds to Trees: Clustering Delicious Tags

Referen es (2)

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esReferen es (1)Referen es (2)Referen es (3)EuroIA Paris - September 2010 � 43 / 44

✔ Guo, L., Ja ob, E., and George, N. (2009). Longitudinalanalysis of tag stru ture in del.i io.us✔ Ray haudhuri, S., Stuart, J. M., and Altman, R. B.(2000). Prin ipal omponents analysis to summarizemi roarray experiments: appli ation to sporulation timeseries. Pa i� Symposium on Bio omputing, 5:452�463

✔ Ros h, E., Mervis, C. B., Gray, W. D., Johnson, D. M.,and Boyes Braem, P. (1976). Basi obje ts in natural ategories. Cognitive Psy hology, 8:382�439

✔ Salonen, J. (2007). Self-organising map based tag louds. In Pro eedings of 1st OPAALS workshop

Page 44: From Clouds to Trees: Clustering Delicious Tags

Referen es (3)

Des ription of theworkThe fa torsThe lustersAppli ations of themethodMethodologi algeneralizationTheoreti al resultsReferen esReferen es (1)Referen es (2)Referen es (3)EuroIA Paris - September 2010 � 44 / 44

✔ Shepitsen, A., Gemmell, J., Mobasher, B., and Burke, R.(2008). Personalized re ommendation in so ial taggingsystems using hierar hi al lustering. In Pro eedings ofthe 2008 ACM onferen e on Re ommender systems,pages 259�266. ACM✔ Wittgenstein, L. (1953). Philosophis he Untersu hungen.Bla kwell, Oxford. English translation Philosophi alInvestigations; trans. and ed. by G.E.M. Ans ombe;se ond edition 1958✔ Zhou, T. and King, I. (2009). Automobile, ar andBMW: horizontal and hierar hi al approa h in so ialtagging systems. In Pro eeding of the 2nd ACMworkshop on So ial web sear h and mining, pages 25�32.ACM