Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University...
-
Upload
amber-mccarthy -
Category
Documents
-
view
213 -
download
0
description
Transcript of Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University...
Active Feedback in Ad Hoc IR
Xuehua Shen, ChengXiang ZhaiDepartment of Computer Science
University of Illinois, Urbana-Champaign
2
Normal Relevance Feedback (RF)
Feedback
Judgments:d1 +d2 -…dk -
Query RetrievalSystem
Top K Resultsd1 3.5d2 2.4…dk 0.5
User
DocumentCollection
3
Document Selection in RF
Feedback
Judgments:d1 +d2 -…dk -
Query RetrievalSystem
Which k docs
to present ?
User
DocumentCollection
Can we do better than just presenting top-K? (Consider diversity…)
4
Active Feedback (AF)
An IR system actively selects documentsfor obtaining relevance judgments
If a user is willing to judge K documents,
which K documents should we present
in order to maximize learning effectiveness?
5
Outline
• Framework and specific methods
• Experiment design and results
• Summary and future work
6
A Framework for Active Feedback
• Consider active feedback as a decision problem– Decide K documents (D) for relevance judgment
• Formalize it as an optimization problem– Optimize the expected learning benefits (loss) by
requesting relevance judgments on D from the user
• Consider two cases of loss function according to the interaction between documents
7
Formula of the Framework
* arg min ( , ) ( | , , )D
D L D p U q C d
1
( , ) ( , , ) ( | , , )
( , , ) ( | , , )
j
k
i iij
L D l D j p j D U
l D j p j d U
Value of documents for learning
Independent judgment
Different judgments
8
Independent Loss
1
( , ) ( , , ) ( | , , )k
i iij
L D l D j p j d U
1
( , , ) ( , , )k
i ii
l D j l d j
Independent Loss
( ) ( , , ) ( | , , ) ( | , , )i
i i i i ij
r d l d j p j d U p U q C d
*
1
arg min ( , , ) ( | , , ) ( | , , )i
k
i i i iD i j
D l d j p j d U p U q C d
1 1
( , ) ( , , ) ( | , , )kk
i i i ii ij
L D l d j p j d U
Expected loss of each document
9
Independent Loss (cont.)
Uncertainty Sampling
( ,1, ) log ( 1 | , ) ( ,0, ) log ( 0 | , ) i i i
i i i
l d p R d d Cl d p R d d C
( ) ( | , ) ( | , , )i ir d H R d p U q C d
( ) ( , , ) ( | , , ) ( | , , )i
i i i i ij
r d l d j p j d U p U q C d
Top K
1
, 0 1 0
, ( ,1, ) , ( 0, ) ,
i i
i
d C l d Cl d C C C
0 1 0( ) ( ) ( 1 | , , ) ( | , , )i i ir d C C C p j d U p U q C d
Relevant docs more useful than non-relevant docs
More uncertain, more useful
10
Dependent Loss
First select Top N docs of baseline retrieval
Cluster N docs into K clusters
K Cluster Centroid
MMR
…
Gapped Top KPick one doc every G+1 docs
1
( , , ) ( 1 | , , ) ( , )k
i ii
L D U p j d U D
More relevant,more useful
More diverse,more useful
11
Illustration of Three AF Methods
Top-K (normal feedback)
12345678910111213141516…
GappedTop-K
K-Cluster Centroid
Aiming at high diversity …
12
Evaluating Active Feedback
QuerySelect K
Docs
K docs
Judgment File
+
Judged Docs
+ ++
--
InitialResultsNo Feedback
(Top-k, Gapped, Clustering)
FeedbackFeedbackResults
13
Retrieval Methods (Lemur toolkit)
Query Q
DDocument D
Q
)||( DQD Results
KL Divergence
Feedback Docs F={d1, …, dn}
Active Feedback
Default parameter settingsunless otherwise stated
FQQ )1('F
Mixture Model Feedback
Only learn from relevant docs
14
Comparison of Three AF Methods
Collection
Active FB Method
#AFRel
Per topic
Include judged docsMAP Pr@10doc
HARD
2003
Baseline / 0.301 0.501Pseudo FB / 0.320 0.515
Top-K 3.0 0.325 0.527Gapped 2.6 0.330** 0.548 *
Clustering 2.4 0.332 0.565
AP88-89
Baseline / 0.201 0.326Pseudo FB / 0.218 0.343
Top-K 2.2 0.228 0.351Gapped 1.5 0.234 * 0.389 **
Clustering 1.3 0.237 ** 0.393 **Top-K is the worst!Clustering uses fewest relevant docs
15
Appropriate Evaluation of Active Feedback
New DB(AP88-89, AP90)
Original DBwith judged docs(AP88-89, HARD)
+ -+
Original DBwithout judged docs
+ -+
Can’t tell if the ranking of un-judged documents is improved
Different methods have different test documents
See the learning effectmore explicitly
But the docs must be similar to original docs
16
Retrieval Performance on AP90 Dataset
Method Baseline Pseudo
FB
Top K Gapped Top K
K Cluster Centroid
MAP 0.203 0.220 0.220 0.222 0.223
pr@10 0.295 0.317 0.321 0.326** 0.325
Top-K is consistently the worst!
17
Mixture Model Parameter Factor
Mixture Model Parameter alpha factor on the Performance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.5 0.6 0.7 0.8 0.9 0.95 0.98
alpha
pr@
10do
cs
Top K on HARD
Gapped Top K on HARD
K Cluster Centroid onHARDTop K on AP88-89
Gapped Top K on AP88-89K Cluster Centroid onAP88-89
FQQ )1('
18
Summary
• Introduce the active feedback problem
• Propose a preliminary framework and three methods (Top-k, Gapped Top-k, Clustering)
• Study the evaluation strategy
• Experiment results show that – Presenting the top-k is not the best strategy
– Clustering can generate fewer, higher quality feedback examples
19
Future Work
• Explore other methods for active feedback
• Develop a general framework
• Combine pseudo feedback and active feedback
20
Thank you !
The End