Pula 5 Giugno 2007

Post on 27-Jan-2015

113 views 2 download

Tags:

description

 

Transcript of Pula 5 Giugno 2007

Complex networks intagging systems

Andrea Capocci

Dipartimento di Informatica e SistemisticaUniversità di Roma ”Sapienza”

Tag networks

www.citeulike.org

Users save scientific publications and tag them with tags (keywords).

Other examples:

Flickr.com (photos)del.icio.us (bookmarks)Connotea.org, BibSonomy (papers)

TAGS

Tagging systems astripartite networks

Tag assignmentA tagging system is a set of tag assignments. A tag assignment is a triplet

(user, resource, tag)

CiteULike550k tag assignments48k distinct tags180k distinct papers6k distinct users

Text analysis of tagging

The stream of tags can be interpreted as a text continuously written by collaborative users.

Zipf laws, preferential attachment and Yule processes in tags streams?

del.icio.us > Cattuto et al.

Sub-linear vocabulary growth

internal time

# of tags

del.icio.us > x0.8

Tag frequency distribution

Preferential attachment

Few tags per resource

Where is semantics?

Such properties can be modeled by Yule-Simon processes with memory (see Cattuto et al.)

But such analysis does not capture the semantics of tags: hierarchical relations etc.

Why semantics matters?

Detection of tags categories.

Understanding users' strategies to improve the system, propose new services.

Spam detection.

Why semantics matters?

Detection of tags categories.

Understanding users' strategies to improve the system, propose new services.

Spam detection.

Why semantics matters?

Detection of tags categories.

Understanding users' strategies to improve the system, propose new services.

Spam detection.

Why semantics matters?

Detection of tags categories.

Understanding users' strategies to improve the system, propose new services.

Spam detection.

Tag co-occurrence network

Tags are nodes.

If two tags are assigned to the sameresource, one puts an edge between thetwo tags.

Edges are weighted: each co-assignmentof two tags increases the edge weight byone.

Strength instead of degree.

Distribution of strength

Distribution of strength

?

Nontrivial clustering & spam detection

Clustering coefficient C(k) Average density of triangles around nodes with degree k

Nontrivial clustering & spam detection

Nontrivial clustering & spam detection

k = 502

Looking for a k = 502 page...

SPAM

Nontrivial clustering & spam detection

spamk = 502

Co-occurrence networksand semantics

Co-occurrence networks are scale-free ones.

The significance of such statistical property is ambiguous.

Clustering encodes semantics (?)

Clustering can be used to detect spam.

Co-occurrence networksand semantics

Co-occurrence networks are scale-free ones.

The significance of such statistical property is ambiguous.

Clustering encodes semantics (?)

Clustering can be used to detect spam.

Co-occurrence networksand semantics

Co-occurrence networks are scale-free ones.

The significance of such statistical property is ambiguous.

Clustering encodes semantics (?)

Clustering can be used to detect spam.

Co-occurrence networksand semantics

Co-occurrence networks are scale-free ones.

The significance of such statistical property is ambiguous.

Clustering encodes semantics (?)

Clustering can be used to detect spam.

Co-occurrence networksand semantics

Co-occurrence networks are scale-free ones.

The significance of such statistical property is ambiguous.

Clustering encodes semantics (?)

Clustering can be used to detect spam.

Users' strategies

Do users tag resources according to tag conceptual

hierarchy?

For example

”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert

Semantics and hierarchy

For example

”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert

scale-free networks

Semantics and hierarchy

Semantics and hierarchyFor example

”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert

scale-free networks networks

HIERARCHICAL

For example

”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert

scale-free networks WWW

NON HIERARCHICAL

Semantics and hierarchy

Model based on hierarchy

Conjectures

1. Tags have an underlying hierarchy.2. With high probability, users add tags hierarchically.

Can we reproduce the co-occurrence network structure based on tag hierarchy?

Model based on hierarchy

The underlying hierarchy is a random tree.

At each time step, we add a new resource, with two tags.

New tags are introduced with probability Pnt.

With probability Psb

, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.

Model based on hierarchy

The underlying hierarchy is a random tree.

At each time step, we add a new resource, with two tags.

New tags are introduced with probability Pnt.

With probability Psb

, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.

Model based on hierarchy

The underlying hierarchy is a random tree.

At each time step, we add a new resource, with two tags.

New tags are introduced with probability Pnt.

With probability Psb

, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.

Model based on hierarchy

The underlying hierarchy is a random tree.

At each time step, we add a new resource, with two tags.

New tags are introduced with probability Pnt.

With probability Psb

, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.

Results: strength distribution

\\

Results: clustering

Conclusions

Tagging systems display non trivial statistical properties: Zipf laws.

Co-occurrence networks are a way of discovering semantic relationship between tags (?)

Clustering in co-occurrence networks encodes semantics (?) and detects spam.

Simple models based on hierarchy partially explain such properties.

Thank youand thanks to...

Guido Caldarelli

The TAGORA group (Cattuto et al.)