Tanvi Motwani- A Few Examples Go A Long Way
-
Upload
tanvi-motwani -
Category
Education
-
view
273 -
download
1
Transcript of Tanvi Motwani- A Few Examples Go A Long Way
![Page 1: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/1.jpg)
Constructing Query Models from Elaborate Query Formulations
A Few Examples Go A Long Way
Krisztian [email protected]
Wouter [email protected]
Maarten de [email protected]
ISLA, University of Amsterdam
Presented by Tanvi Motwani
![Page 2: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/2.jpg)
AIM
• This paper aims to introduce and compare several methods for sampling expansion terms with query independent as well as query dependent techniques.
• Along with the query it takes sample documents as input. Sample documents are additional information that users provide consisting of small number of “key references” (pages that should be linked to by good overview page of that topic)
• Aim is to increase “aspect recall” by attempting to uncover aspects of information which are not captured by the query but by the sample documents.
![Page 3: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/3.jpg)
Aspect Retrieval
Query: What are current applications of robotics?
Find as many different applications as possible.
Example Aspects
A1: spot-welding robotics
A2: controlling inventory
A3: pipe-laying robots
A4: talking robot
A5: robots for loading & unloading
memory tapesA6: robot telephone operators
A7: robot cranes… …
Aspect judgments A1 A2 A3 … ... Ak
d1 1 1 0 0 … 0 0
d2 0 1 1 1 … 0 0
d3 0 0 0 0 … 1 0
….dk 1 0 1 0 ... 0 1
![Page 4: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/4.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
![Page 5: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/5.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Query Likelihoo
d
Document Modeling
Query Modeli
ng
![Page 6: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/6.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Query Likelihoo
d
Document Modeling
Query Modeli
ng
![Page 7: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/7.jpg)
What is a Rainforest?
P(D1|Q) = 0.32
P(D2|Q) = 0.26
P(D3|Q) = 0.19
P(D4|Q) = 0.12
P(D5|Q) = 0.09
Query (Q) Documents
![Page 8: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/8.jpg)
Query Likelihood
•
•
•
•
Bayes’ Rule
Ignoring P(Q)
Assuming Independence of Query terms
Taking log
• Using query and document models
![Page 9: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/9.jpg)
What is a Rainforest?
Query (Q) Documents
Relevance
Model
![Page 10: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/10.jpg)
Underlying Relevance Model
The query and relevant documents are random samples from an underlying relevance model R.
Documents are ranked based on their similarity to the query model.The Kullback-Leibler divergence between the query and document models can he used to provide a ranking of documents.
![Page 11: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/11.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Query Likelihoo
d
Document Modeling
Query Modeli
ng
![Page 12: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/12.jpg)
Document Modeling
Maximum Likelihood Estimate
Smoothing ML estimate
This document will have P(“Rain”|D) as 0, thus smoothing is required.
![Page 13: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/13.jpg)
Query Modeling
P(t|Q) is extremely space and thus query expansion is necessary.
This document does not have words “Rain” and “Forest” but have related words such as “Wild Life”. Expansion of query brings different “aspects” of the topic.
![Page 14: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/14.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
![Page 15: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/15.jpg)
Experimental Setup
• CSIRO Enterprise Research Collection (CERC), a crawl of *.csiro.au web site conducted in March 2007.
• 370,715 documents
• Size of 4.2 gigabytes
• 50 topics
• Judgments made in 3-point scale: 2: highly relevant “key reference”1: candidate key page0: not a “key reference”
![Page 16: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/16.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Maximizing Average Precision (MAX_AP)
Maximizing Query
Log Likelihood (MAX_QLL
)
Best Empirical estimate (EMP_BES
T)
![Page 17: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/17.jpg)
Parameter Estimation
Maximizing Average Precision (MAX_AP)
Maximizing Query Log likelihood (MAX_QLL)
Best Empirical Estimate (EMP_BEST)
•
•
•
![Page 18: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/18.jpg)
Evaluation
•Maximum AP score is reached when weight is 0.6
•MAX_QLL performs slightly better than MAX_AP
![Page 19: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/19.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Feedback Using
Relevance Models
Relevance Models from Sample Documents
Query Model from Sample Documents
![Page 20: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/20.jpg)
Query Representation
• Combination of expanded query terms is performed with the original query terms.• This prevents the topic to shift away from the original user information need.
![Page 21: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/21.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Feedback Using
Relevance Models
Relevance Models from Sample Documents
Query Model from Sample Documents
![Page 22: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/22.jpg)
Feedback Using Relevance Models
Joint Probability of observing t together with query terms q1,q2…qk divided by joint probability of the query terms.
• RM1: It is assumed that t and qi are sampled independently and identically to each other
• RM2 : Sampling of q1,q2…qk are dependent on t but independent of each other.
![Page 23: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/23.jpg)
RM1
Assume weight of smoothing is 0.“wild” appears 5 times in this document.“rain” appears 20 times in this document.“forest” appears 30 times in this document.Number of unique terms in this document are 150.M is just this single document.P(D1) = 1/5P(“wild”, “rain”, “forest”) = 1/5* 5/150 * 20/150 * 30/150
![Page 24: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/24.jpg)
RM2
Given the term “wild” we first pick a document from M set with probability P(D|t) and then sample query words from the document.
Assume P(D | “wild”) is 0.7This document has 10 “rain” wordsAnd 20 “forest” wordsDocument has 200 unique wordsP(“wild”) is 0.2And M is just this documentP(“wild”, “rain”, “forest”)= 0.2* 0.7 * 20/200 * 10/200
![Page 25: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/25.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Feedback Using
Relevance Models
Relevance Models from Sample Documents
Query Model from Sample Documents
![Page 26: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/26.jpg)
Relevance Models from Sample Documents
• Apply Relevance Models on Sample Document instead of Feedback documents i.e. set M = S.
• For RM1 assume P(D) = 1/|S|.
![Page 27: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/27.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Feedback Using
Relevance Models
Relevance Models from Sample Documents
Query Model from Sample Documents
![Page 28: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/28.jpg)
Query Model from Sample Documents
Top K terms with highest probability P(t|S) are taken and used to formulate expanded query.
1. Sample Document set S2. Select document D from this set S with probability P(D|S)3. From this document, generate term t with probability P(t|D)4. Sum over all sample documents to obtain P(t|S)
![Page 29: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/29.jpg)
Query Model from Sample Documents
• Maximum Likelihood Estimate of a term (EX-QM-ML)
• Smoothed Estimate of a term (EX-QM-SM)
• Ranking Function proposed by Ponte and Croft for unsupervised query expansion (EX-QM-EXP)
![Page 30: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/30.jpg)
Query Model from Sample Documents
Three options for estimating P(D|S)
• Uniform: • Query-biased:
• Inverse query-biased:
![Page 31: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/31.jpg)
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
![Page 32: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/32.jpg)
Expanded Query Models
![Page 33: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/33.jpg)
Combination with Original Query
![Page 34: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/34.jpg)
Importance of Sample Document
![Page 35: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/35.jpg)
Topic Level Comparison
![Page 36: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/36.jpg)
Topic Level Comparison
![Page 37: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/37.jpg)
Sampling conditioned on query
![Page 38: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/38.jpg)
Conclusion
• Introduced a method of sampling query expansion terms in a query-independent way, based on sample documents that reflect “aspects” of user’s information need that are not captured by the query.
• Introduced different versions of expansion term selection method, based on different term selection and document importance weighting methods and compared them against more traditional query expansion terms is a query-biased manner.
![Page 39: Tanvi Motwani- A Few Examples Go A Long Way](https://reader035.fdocuments.us/reader035/viewer/2022062418/5550606fb4c905ae3f8b536e/html5/thumbnails/39.jpg)
Questions/Discussion
• Every topic needs a sample document set, is this method feasible in real world domain where there are uncountable topics?
• Aspect Recall is obtained from the sample documents, aren’t we dependent on the “goodness” or the amount of different aspects covered in sample documents for obtaining a high aspect recall?
• Theoretically there is slight increase in MAP measurement as compared to BFB-RM2 (around 0.07), for a end-user will it provide any difference in user experience? Is such a small gain in MAP worth the high cost of obtaining sample documents?