© 2004 Chris Staff CSAW’04 University of Malta [email protected] of 15 Expanding Query Terms...

[email protected] 1 of 15© 2004 Chris StaffCSAW’04

University of Malta

Expanding Query Terms in Context

Chris Staff and Robert MuscatDepartment of Computer Science & AI

University of Malta


University of Malta

Aims of this presentation

• Background – The Vocabulary Problem in IR

• Scenario– Using retrieved documents to determine how to

expand query

• Approach

• Evaluation


University of Malta

The Vocabulary Problem

• Furnas et al, 1987, find that any two people describe the same concept/object using the same term with a probability of less than .2

• This is a huge problem for IR– High probability of finding some documents

about your term (but watch ambiguous terms!)– Low probability of finding all documents about

your concept (so low ‘coverage’)


University of Malta

What’s Query Expansion?

• Adding terms to query to improve recall while keeping precision high

• Recall is 1 when all relevant docs are retrieved

• Precision is 1 when all retrieved docs are relevant


University of Malta

What’s Query Expansion?

• Attempts to improve recall (adding synonyms) usually involve constructed thesaurus (Qiu et al, 1995, Mandala et al, 1999, Voorhees, 1994)

• Attempts to improve precision (by adding restricting terms) now based around automatic relevance feedback (e.g., Mitra et al, 1998)

• Indiscriminate query expansion can lead to loss of precision (Voorhees, 1994) or hurt recall


University of Malta

Scenario

• Two users search for information related to the same concept C

• User queries Q1 and Q2 have no terms in common

• R1 and R2 are results sets of Q1 and Q2 respectively

• Rcommon = R1 R2


University of Malta

Scenario

• We assume that Rcommon is small and non-empty (Furnas, 1985 and Furnas et al, 1987)

• If Rcommon is large then Q1 and Q2 will both retrieve same set of documents

• Can determine (using WordNet) if any term in Q1 is the synonym of a term in Q2

– Some doc Dk in Rcommon probably includes both terms (because of way Web IR works)!


University of Malta

Scenario

• If t1 in Q1 and t2 in Q2 are synonyms

– Can expand either in future queries containing t1 or t2

– As long as doc Dk appears in results set (the context)


University of Malta

Approach

• ‘Learning’ synonyms in context

• Query Expansion


University of Malta

‘Learning’ Synonyms in Context

• A document is associated with a “bag of words” ever used to retrieve doc

• A term, document pair is associated with a synset for the term in the context of the doc– Word sense from WordNet also recorded to

reduce ambiguity


University of Malta

Query Expansion in Context

• Submit unexpanded original user query Q to obtain results set R

• For each document Dk in R (k is rank) retrieve synsets for terms in Q

• Same query term in context of different docs in R may yield inconsistent synsets– Countered using Inverse Document Relevance


University of Malta

Inverse Document Relevance

• IDR is relative frequency with which doc d is retrieved in rank k when term q occurs in the query

• IDRq,d = Wq,d / Wd (where Wd is number of times d retrieved, Wq,d number of times d retrieved when q occurs in query)


University of Malta

Term Document Relevance

• We then re-rank documents in R based on their TDR

• TDRq,d,k = IDRq,d x Wq,d,k / Wd,k

• Synsets of top-10 re-ranked document are merged according to word category and sense

• Most frequently occurring word category, word sense pair synset used to expand q in query


University of Malta

Evaluation

• Need huge query log, ideally, with relevance judgements for queries

• We have TREC QA collection, but we’ll need to index them before running the test queries through them (using, e.g., SMART)– Disadvantage that there might not be enough

queries

• User Studies


University of Malta

Thank you!

© 2004 Chris Staff CSAW’04 University of Malta [email protected] of 15 Expanding Query Terms...

Documents

Transcript of © 2004 Chris Staff CSAW’04 University of Malta [email protected] of 15 Expanding Query Terms...