© 2004 Chris Staff CSAW’04 University of Malta [email protected] of 15 Expanding Query Terms...
-
Upload
griffin-garrison -
Category
Documents
-
view
217 -
download
0
Transcript of © 2004 Chris Staff CSAW’04 University of Malta [email protected] of 15 Expanding Query Terms...
![Page 1: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/1.jpg)
[email protected] 1 of 15© 2004 Chris StaffCSAW’04
University of Malta
Expanding Query Terms in Context
Chris Staff and Robert MuscatDepartment of Computer Science & AI
University of Malta
![Page 2: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/2.jpg)
[email protected] 2 of 15© 2004 Chris StaffCSAW’04
University of Malta
Aims of this presentation
• Background – The Vocabulary Problem in IR
• Scenario– Using retrieved documents to determine how to
expand query
• Approach
• Evaluation
![Page 3: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/3.jpg)
[email protected] 3 of 15© 2004 Chris StaffCSAW’04
University of Malta
The Vocabulary Problem
• Furnas et al, 1987, find that any two people describe the same concept/object using the same term with a probability of less than .2
• This is a huge problem for IR– High probability of finding some documents
about your term (but watch ambiguous terms!)– Low probability of finding all documents about
your concept (so low ‘coverage’)
![Page 4: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/4.jpg)
[email protected] 4 of 15© 2004 Chris StaffCSAW’04
University of Malta
What’s Query Expansion?
• Adding terms to query to improve recall while keeping precision high
• Recall is 1 when all relevant docs are retrieved
• Precision is 1 when all retrieved docs are relevant
![Page 5: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/5.jpg)
[email protected] 5 of 15© 2004 Chris StaffCSAW’04
University of Malta
What’s Query Expansion?
• Attempts to improve recall (adding synonyms) usually involve constructed thesaurus (Qiu et al, 1995, Mandala et al, 1999, Voorhees, 1994)
• Attempts to improve precision (by adding restricting terms) now based around automatic relevance feedback (e.g., Mitra et al, 1998)
• Indiscriminate query expansion can lead to loss of precision (Voorhees, 1994) or hurt recall
![Page 6: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/6.jpg)
[email protected] 6 of 15© 2004 Chris StaffCSAW’04
University of Malta
Scenario
• Two users search for information related to the same concept C
• User queries Q1 and Q2 have no terms in common
• R1 and R2 are results sets of Q1 and Q2 respectively
• Rcommon = R1 R2
![Page 7: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/7.jpg)
[email protected] 7 of 15© 2004 Chris StaffCSAW’04
University of Malta
Scenario
• We assume that Rcommon is small and non-empty (Furnas, 1985 and Furnas et al, 1987)
• If Rcommon is large then Q1 and Q2 will both retrieve same set of documents
• Can determine (using WordNet) if any term in Q1 is the synonym of a term in Q2
– Some doc Dk in Rcommon probably includes both terms (because of way Web IR works)!
![Page 8: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/8.jpg)
[email protected] 8 of 15© 2004 Chris StaffCSAW’04
University of Malta
Scenario
• If t1 in Q1 and t2 in Q2 are synonyms
– Can expand either in future queries containing t1 or t2
– As long as doc Dk appears in results set (the context)
![Page 9: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/9.jpg)
[email protected] 9 of 15© 2004 Chris StaffCSAW’04
University of Malta
Approach
• ‘Learning’ synonyms in context
• Query Expansion
![Page 10: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/10.jpg)
[email protected] 10 of 15© 2004 Chris StaffCSAW’04
University of Malta
‘Learning’ Synonyms in Context
• A document is associated with a “bag of words” ever used to retrieve doc
• A term, document pair is associated with a synset for the term in the context of the doc– Word sense from WordNet also recorded to
reduce ambiguity
![Page 11: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/11.jpg)
[email protected] 11 of 15© 2004 Chris StaffCSAW’04
University of Malta
Query Expansion in Context
• Submit unexpanded original user query Q to obtain results set R
• For each document Dk in R (k is rank) retrieve synsets for terms in Q
• Same query term in context of different docs in R may yield inconsistent synsets– Countered using Inverse Document Relevance
![Page 12: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/12.jpg)
[email protected] 12 of 15© 2004 Chris StaffCSAW’04
University of Malta
Inverse Document Relevance
• IDR is relative frequency with which doc d is retrieved in rank k when term q occurs in the query
• IDRq,d = Wq,d / Wd (where Wd is number of times d retrieved, Wq,d number of times d retrieved when q occurs in query)
![Page 13: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/13.jpg)
[email protected] 13 of 15© 2004 Chris StaffCSAW’04
University of Malta
Term Document Relevance
• We then re-rank documents in R based on their TDR
• TDRq,d,k = IDRq,d x Wq,d,k / Wd,k
• Synsets of top-10 re-ranked document are merged according to word category and sense
• Most frequently occurring word category, word sense pair synset used to expand q in query
![Page 14: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.](https://reader036.fdocuments.us/reader036/viewer/2022082506/56649f435503460f94c635d0/html5/thumbnails/14.jpg)
[email protected] 14 of 15© 2004 Chris StaffCSAW’04
University of Malta
Evaluation
• Need huge query log, ideally, with relevance judgements for queries
• We have TREC QA collection, but we’ll need to index them before running the test queries through them (using, e.g., SMART)– Disadvantage that there might not be enough
queries
• User Studies