Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University...

Active Feedback in Ad Hoc IR

Xuehua Shen, ChengXiang ZhaiDepartment of Computer Science

University of Illinois, Urbana-Champaign

2

Normal Relevance Feedback (RF)

Feedback

Judgments:d1 +d2 -…dk -

Query RetrievalSystem

Top K Resultsd1 3.5d2 2.4…dk 0.5

User

DocumentCollection

3

Document Selection in RF

Feedback

Judgments:d1 +d2 -…dk -

Query RetrievalSystem

Which k docs

to present ?

User

DocumentCollection

Can we do better than just presenting top-K? (Consider diversity…)

4

Active Feedback (AF)

An IR system actively selects documentsfor obtaining relevance judgments

If a user is willing to judge K documents,

which K documents should we present

in order to maximize learning effectiveness?

5

Outline

• Framework and specific methods

• Experiment design and results

• Summary and future work

6

A Framework for Active Feedback

• Consider active feedback as a decision problem– Decide K documents (D) for relevance judgment

• Formalize it as an optimization problem– Optimize the expected learning benefits (loss) by

requesting relevance judgments on D from the user

• Consider two cases of loss function according to the interaction between documents

7

Formula of the Framework

* arg min ( , ) ( | , , )D

D L D p U q C d

1

( , ) ( , , ) ( | , , )

( , , ) ( | , , )

j

k

i iij

L D l D j p j D U

l D j p j d U

Value of documents for learning

Independent judgment

Different judgments

8

Independent Loss

1

( , ) ( , , ) ( | , , )k

i iij

L D l D j p j d U

1

( , , ) ( , , )k

i ii

l D j l d j

Independent Loss

( ) ( , , ) ( | , , ) ( | , , )i

i i i i ij

r d l d j p j d U p U q C d

*

1

arg min ( , , ) ( | , , ) ( | , , )i

k

i i i iD i j

D l d j p j d U p U q C d

1 1

( , ) ( , , ) ( | , , )kk

i i i ii ij

L D l d j p j d U

Expected loss of each document

9

Independent Loss (cont.)

Uncertainty Sampling

( ,1, ) log ( 1 | , ) ( ,0, ) log ( 0 | , ) i i i

i i i

l d p R d d Cl d p R d d C

( ) ( | , ) ( | , , )i ir d H R d p U q C d

( ) ( , , ) ( | , , ) ( | , , )i

i i i i ij

r d l d j p j d U p U q C d

Top K

1

, 0 1 0

, ( ,1, ) , ( 0, ) ,

i i

i

d C l d Cl d C C C

0 1 0( ) ( ) ( 1 | , , ) ( | , , )i i ir d C C C p j d U p U q C d

Relevant docs more useful than non-relevant docs

More uncertain, more useful

10

Dependent Loss

First select Top N docs of baseline retrieval

Cluster N docs into K clusters

K Cluster Centroid

MMR

…

Gapped Top KPick one doc every G+1 docs

1

( , , ) ( 1 | , , ) ( , )k

i ii

L D U p j d U D

More relevant,more useful

More diverse,more useful

11

Illustration of Three AF Methods

Top-K (normal feedback)

12345678910111213141516…

GappedTop-K

K-Cluster Centroid

Aiming at high diversity …

12

Evaluating Active Feedback

QuerySelect K

Docs

K docs

Judgment File

+

Judged Docs

+ ++

--

InitialResultsNo Feedback

(Top-k, Gapped, Clustering)

FeedbackFeedbackResults

13

Retrieval Methods (Lemur toolkit)

Query Q

DDocument D

Q

)||( DQD Results

KL Divergence

Feedback Docs F={d1, …, dn}

Active Feedback

Default parameter settingsunless otherwise stated

FQQ )1('F

Mixture Model Feedback

Only learn from relevant docs

14

Comparison of Three AF Methods

Collection

Active FB Method

#AFRel

Per topic

Include judged docsMAP Pr@10doc

HARD

2003

Baseline / 0.301 0.501Pseudo FB / 0.320 0.515

Top-K 3.0 0.325 0.527Gapped 2.6 0.330** 0.548 *

Clustering 2.4 0.332 0.565

AP88-89

Baseline / 0.201 0.326Pseudo FB / 0.218 0.343

Top-K 2.2 0.228 0.351Gapped 1.5 0.234 * 0.389 **

Clustering 1.3 0.237 ** 0.393 **Top-K is the worst!Clustering uses fewest relevant docs

15

Appropriate Evaluation of Active Feedback

New DB(AP88-89, AP90)

Original DBwith judged docs(AP88-89, HARD)

+ -+

Original DBwithout judged docs

+ -+

Can’t tell if the ranking of un-judged documents is improved

Different methods have different test documents

See the learning effectmore explicitly

But the docs must be similar to original docs

16

Retrieval Performance on AP90 Dataset

Method Baseline Pseudo

FB

Top K Gapped Top K

K Cluster Centroid

MAP 0.203 0.220 0.220 0.222 0.223

pr@10 0.295 0.317 0.321 0.326** 0.325

Top-K is consistently the worst!

17

Mixture Model Parameter Factor

Mixture Model Parameter alpha factor on the Performance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.5 0.6 0.7 0.8 0.9 0.95 0.98

alpha

pr@

10do

cs

Top K on HARD

Gapped Top K on HARD

K Cluster Centroid onHARDTop K on AP88-89

Gapped Top K on AP88-89K Cluster Centroid onAP88-89

FQQ )1('

18

Summary

• Introduce the active feedback problem

• Propose a preliminary framework and three methods (Top-k, Gapped Top-k, Clustering)

• Study the evaluation strategy

• Experiment results show that – Presenting the top-k is not the best strategy

– Clustering can generate fewer, higher quality feedback examples

19

Future Work

• Explore other methods for active feedback

• Develop a general framework

• Combine pseudo feedback and active feedback

20

Thank you !

The End

Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University...

Documents

Transcript of Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University...