Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion:...

29
Social Tagging in Query Expansion: Social Tagging in Query Expansion: a new Way for Personalized Web Search a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli, Francesco Saverio Profiti Department of Computer Science an Automation Artificial Intelligence Laboratory Rome Tre University and LAzio Innovazione Tecnologica S.p.A.

Transcript of Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion:...

Page 1: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

Social Tagging in Query Expansion: Social Tagging in Query Expansion: a new Way for Personalized Web Searcha new Way for Personalized Web Search

Claudio Biancalana, Alessandro Micarelli, Francesco Saverio Profiti

Department of Computer Science an Automation

Artificial Intelligence Laboratory

Rome Tre University

and

LAzio Innovazione Tecnologica S.p.A.

Page 2: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

MotivationsMotivations

� Information overload

� Human Factors

� Semantics

� Vocabulary problem

� Ineffectiveness of Short Queries

Page 3: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

QueryQuery ExpansionExpansion (1/2)(1/2)

� The process of expanding a user querywith additional related words and phrases

� In the context of web search engines,query expansion involves evaluating a userquery expansion involves evaluating a userinput typed into the search query area andexpanding the search query to matchadditional documents

Page 4: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

Personalized Web Search ArchitecturePersonalized Web Search Architecture

Search Engine

User

Visited pages

Query

ImplicitFeedback

User Model

a,b a,b,c,d,e

Personalization

Page 5: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

QueryQuery ExpansionExpansion (2/(2/22))

Original Query: Q = {q1, q2, …, qk, qk+1, …, qn}

Terms to add: Q+ = {e1, e2, …, em}

Terms to remove: Q- = {qk+1, …, qn}

Expanded Query

EQ = (Q U Q+) - Q-

{q1, q2, …, qk, e1, e2, …, em}

Page 6: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

UserUser modelingmodeling and and personalizationpersonalization in in webweb

� Personalization is tailoring a consumerproduct, electronic or written medium to auser, based on personal details oruser, based on personal details orcharacteristics that user provides

� In web search engines, personalization istailoring search results based on theinterests of a user

Page 7: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

PrePre--processingprocessing

� HTML Tag Elimination

� Semantic Analysis: Monty POS tagger

◦ adjective, noun, proper noun, preposition◦ adjective, noun, proper noun, preposition

� Stop Word Elimination

� Stemming

Page 8: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

UserUser ModelModel and and CoCo--occourenceoccourence

� Co-occurrence is the extent of whichtwo terms tend to appear simultaneouslyin the same context

� User Model: Co-occurrence terms matrix

t1

t2

t3

t4

t5

t1 t2 t3 t4 t5

0.0

0.0

0.0

0.0

0.0

2.0

0.0

0.0

0.0

0.0

1.0

9.0

1.0

4.0

1.0

2.0

2.0

1.0

0.0

3.0

3.0

4.00.0

2.0

9.0

how do we build matrix values?

Page 9: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

PersonalizationPersonalization and and QueryQuery ExpansionExpansion

� Method I

◦ Bigrams

� Method II

◦ Hyperspace Analogue to Language◦ Hyperspace Analogue to Language

� Method III

◦ Page Level co-occurence

� Method IV

◦ Page Level co-occurence and term proximity

Page 10: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

MethodMethod I I -- BigramsBigrams

� The user model is built around the concept of bigrams, namely a pair consisting of two adjacent terms in the text of a web page. Two terms are text of a web page. Two terms are considered co-occurring only if adjacent.

� The context of a term is thus exclusively limited to the term that is directly next to it, either to the left or to the right;

Page 11: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

MethodMethod II II -- HALHAL

� Given a window of N terms, that can be scrolled inside a page text, two terms are considered co-occurring only if they are within such window. within such window.

� The co-occurrence value will be inversely proportional to the distance between the two terms within the window;Example N=5

t1 t2 t3 t4 t5 t6

1/11/21/31/41/5

Page 12: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

MethodMethod III III –– PagePage LevelLevel coco--ococ (1/2)(1/2)

�Within this method, the context of a term is expanded to the entire page considered.

� Two terms are then deemed co-occurring � Two terms are then deemed co-occurring only if they are both present, simultaneously, in the same page;

Page 13: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

MethodMethod III III –– PagePage LevelLevel coco--ococ (2/(2/22))

� For each document, a co-occurrence matrix is generated and then summed up in a single matrix

� POS tagger extracts the nouns, proper nouns and adjectives

� Only the first k keyword are used, following an order � Only the first k keyword are used, following an order based on tf*idf

Co-occurence matrix Weighted co-occurrence matrix

Page 14: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

MethodMethod IV IV –– PagePage LevelLevel coco--ococ and and termtermproximityproximity

Co-occurence at page level

(method III)

++

Term Proximity

t1 t2 t3 t4 t5 t6

1/11/21/31/41/5

Page 15: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

PersonalizationPersonalization and and QueryQuery ExpansionExpansion

� In this research we implement method III

� Method III has better performancecompared to others as we present in:compared to others as we present in:

[1] C. Biancalana, A. Micarelli, A. Lapolla, "Personalized Web Search using

Correlation Matrix for Query Expansion" in Joaquim Filipe, José Cordeiro,

Vitor Pedrosa (Eds.): "Web Information Systems and Technologies", LNBIP,

2009

Page 16: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

Query ExpansionQuery Expansion

� In Query Expansion process, we select the rows representing original query terms (ex. t2,t3).

t1 t2 t3 t4 t5

Q = t2,t3t1

t2

t3

t4

t5

t1 t2 t3 t4 t5

0.0

0.0

0.0

0.0

0.0

2.0

0.0

0.0

0.0

0.0

1.0

9.0

1.0

4.0

1.0

2.0

2.0

1.0

0.0

3.0

3.0

4.00.0

2.0

9.0

Page 17: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

Query ExpansionQuery Expansion

� Sum up selected rows

� Select the first N terms (high values) of new vector.

N=1, Q = t2,t3

1.0 3.0 3.0 11 0.0

t4

t1

t2

t3

t4

t5

t1 t2 t3 t4 t5

0.0

0.0

0.0

0.0

0.0

2.0

0.0

0.0

0.0

0.0

1.0

9.0

1.0

4.0

1.0

2.0

2.0

1.0

0.0

3.0

3.0

4.00.0

2.0

9.0

Page 18: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

CoCo--occurrence matrices occurrence matrices limitslimits

� Semantic aspects:◦ In particular polisemy and homonimy

� For example:

◦ Possible results:

http://www.amazon.com/

http://en.wikipedia.org/wiki/Amazon_River

◦ User query : “amazon”

◦ Expanded query : “amazon buy river”

Page 19: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

OurOur solutionsolution forfor CoCo--occurrence occurrence matrices matrices limitslimits

� Extention of Co-occourence matrix:

◦ Introduction of metadata as third dimension of the matrix

� Use of Social Bookmarking services for � Use of Social Bookmarking services for metadata retrival:◦ del.icio.us

◦ stumbleupon.com

◦ …

Page 20: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

ThreeThree--dimensionaldimensional coco--occurenceoccurence matrixmatrixstructurestructure

Page 21: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

ExampleExample

� User query: amazon

Category Expanded

Query

Results

e-commerce

amazon AND buy

AND

(book OR books)

http://www.amazon.com/

http://www.amazon.co.uk/

nature

amazon

AND

(river OR rivers)

http://en.wikipedia.org/wiki/Amazon_River

http://www.mbarron.net/Amazon/

Page 22: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

ExperimentationsExperimentations

� The employed benchmark: Lazio Region Portal Data (LRDP)

� An example of topics is Top/Sala Stampa/Presidente/Biografia; Top/Sala Stampa/Presidente/Biografia;

◦ Level I: Sala Stampa;

◦ Level II: Presidente;

◦ Level III: Biografia

� Given the large quantity of links contained in LRPD, we decided to consider only level III links

Page 23: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

ExperimentationsExperimentations

� Lazio is a region situated in thecentral of Italy, whose largest city isRome

� Lazio Region is also the name ofpublic administration that governscitizens of this regioncitizens of this region

� Like most italian publicadministrations, Lazio has a webportal through which provides e-government services to its citizens

Page 24: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

ExperimentationsExperimentationsTop Level I

Level IILevel III

Page 25: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

ExperimentationsExperimentations

� Each topic’s links were then subdivided in a training set, corresponding to 25% of the links, and set of tests, corrisponding to 75% of the links

Page 26: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

ExperimentationsExperimentations

�We use for experimentations:

◦ Page Level Co-oc metric for the Co-occourencematrix costruction

◦ del.icio.us Social Bookmarking service

◦ F1-measure as performance indicator:◦ F1-measure as performance indicator:

where stands for the number of returned linksbelonging to topic t, only the first 50 pages are taken inconsideration for our tests, and the overall numberof test links belonging to topic t present in the index.

Page 27: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

ExperimentationsExperimentations

�We have compared our system with:

◦ a system based on a traditional content-based user-modeling approach, where documents user-modeling approach, where documents are represented in the Vector Space Model and without Query Expansion (no QE)

◦ system focuses on the update of the user model by means of Relevance Feedback (RF) techniques (no Social Bookmarking)

Page 28: Claudio Biancalana, Alessandro Micarelli, Francesco ...€¦ · Social Tagging in Query Expansion: a new Way for Personalized Web Search Claudio Biancalana, Alessandro Micarelli,

ExperimentationExperimentation

� The following table shows the resultsobtained: