Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University...

20
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

description

3 Document Selection in RF Feedback Judgments: d 1 + d 2 - … d k - Query Retrieval System Which k docs to present ? User Document Collection Can we do better than just presenting top-K? (Consider diversity…)

Transcript of Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University...

Page 1: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Active Feedback in Ad Hoc IR

Xuehua Shen, ChengXiang ZhaiDepartment of Computer Science

University of Illinois, Urbana-Champaign

Page 2: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

2

Normal Relevance Feedback (RF)

Feedback

Judgments:d1 +d2 -…dk -

Query RetrievalSystem

Top K Resultsd1 3.5d2 2.4…dk 0.5

User

DocumentCollection

Page 3: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

3

Document Selection in RF

Feedback

Judgments:d1 +d2 -…dk -

Query RetrievalSystem

Which k docs

to present ?

User

DocumentCollection

Can we do better than just presenting top-K? (Consider diversity…)

Page 4: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

4

Active Feedback (AF)

An IR system actively selects documentsfor obtaining relevance judgments

If a user is willing to judge K documents,

which K documents should we present

in order to maximize learning effectiveness?

Page 5: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

5

Outline

• Framework and specific methods

• Experiment design and results

• Summary and future work

Page 6: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

6

A Framework for Active Feedback

• Consider active feedback as a decision problem– Decide K documents (D) for relevance judgment

• Formalize it as an optimization problem– Optimize the expected learning benefits (loss) by

requesting relevance judgments on D from the user

• Consider two cases of loss function according to the interaction between documents

Page 7: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

7

Formula of the Framework

* arg min ( , ) ( | , , )D

D L D p U q C d

1

( , ) ( , , ) ( | , , )

( , , ) ( | , , )

j

k

i iij

L D l D j p j D U

l D j p j d U

Value of documents for learning

Independent judgment

Different judgments

Page 8: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

8

Independent Loss

1

( , ) ( , , ) ( | , , )k

i iij

L D l D j p j d U

1

( , , ) ( , , )k

i ii

l D j l d j

Independent Loss

( ) ( , , ) ( | , , ) ( | , , )i

i i i i ij

r d l d j p j d U p U q C d

*

1

arg min ( , , ) ( | , , ) ( | , , )i

k

i i i iD i j

D l d j p j d U p U q C d

1 1

( , ) ( , , ) ( | , , )kk

i i i ii ij

L D l d j p j d U

Expected loss of each document

Page 9: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

9

Independent Loss (cont.)

Uncertainty Sampling

( ,1, ) log ( 1 | , ) ( ,0, ) log ( 0 | , ) i i i

i i i

l d p R d d Cl d p R d d C

( ) ( | , ) ( | , , )i ir d H R d p U q C d

( ) ( , , ) ( | , , ) ( | , , )i

i i i i ij

r d l d j p j d U p U q C d

Top K

1

, 0 1 0

, ( ,1, ) , ( 0, ) ,

i i

i

d C l d Cl d C C C

0 1 0( ) ( ) ( 1 | , , ) ( | , , )i i ir d C C C p j d U p U q C d

Relevant docs more useful than non-relevant docs

More uncertain, more useful

Page 10: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

10

Dependent Loss

First select Top N docs of baseline retrieval

Cluster N docs into K clusters

K Cluster Centroid

MMR

Gapped Top KPick one doc every G+1 docs

1

( , , ) ( 1 | , , ) ( , )k

i ii

L D U p j d U D

More relevant,more useful

More diverse,more useful

Page 11: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

11

Illustration of Three AF Methods

Top-K (normal feedback)

12345678910111213141516…

GappedTop-K

K-Cluster Centroid

Aiming at high diversity …

Page 12: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

12

Evaluating Active Feedback

QuerySelect K

Docs

K docs

Judgment File

+

Judged Docs

+ ++

--

InitialResultsNo Feedback

(Top-k, Gapped, Clustering)

FeedbackFeedbackResults

Page 13: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

13

Retrieval Methods (Lemur toolkit)

Query Q

DDocument D

Q

)||( DQD Results

KL Divergence

Feedback Docs F={d1, …, dn}

Active Feedback

Default parameter settingsunless otherwise stated

FQQ )1('F

Mixture Model Feedback

Only learn from relevant docs

Page 14: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

14

Comparison of Three AF Methods

Collection

Active FB Method

#AFRel

Per topic

Include judged docsMAP Pr@10doc

HARD

2003

Baseline / 0.301 0.501Pseudo FB / 0.320 0.515

Top-K 3.0 0.325 0.527Gapped 2.6 0.330** 0.548 *

Clustering 2.4 0.332 0.565

AP88-89

Baseline / 0.201 0.326Pseudo FB / 0.218 0.343

Top-K 2.2 0.228 0.351Gapped 1.5 0.234 * 0.389 **

Clustering 1.3 0.237 ** 0.393 **Top-K is the worst!Clustering uses fewest relevant docs

Page 15: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

15

Appropriate Evaluation of Active Feedback

New DB(AP88-89, AP90)

Original DBwith judged docs(AP88-89, HARD)

+ -+

Original DBwithout judged docs

+ -+

Can’t tell if the ranking of un-judged documents is improved

Different methods have different test documents

See the learning effectmore explicitly

But the docs must be similar to original docs

Page 16: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

16

Retrieval Performance on AP90 Dataset

Method Baseline Pseudo

FB

Top K Gapped Top K

K Cluster Centroid

MAP 0.203 0.220 0.220 0.222 0.223

pr@10 0.295 0.317 0.321 0.326** 0.325

Top-K is consistently the worst!

Page 17: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

17

Mixture Model Parameter Factor

Mixture Model Parameter alpha factor on the Performance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.5 0.6 0.7 0.8 0.9 0.95 0.98

alpha

pr@

10do

cs

Top K on HARD

Gapped Top K on HARD

K Cluster Centroid onHARDTop K on AP88-89

Gapped Top K on AP88-89K Cluster Centroid onAP88-89

FQQ )1('

Page 18: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

18

Summary

• Introduce the active feedback problem

• Propose a preliminary framework and three methods (Top-k, Gapped Top-k, Clustering)

• Study the evaluation strategy

• Experiment results show that – Presenting the top-k is not the best strategy

– Clustering can generate fewer, higher quality feedback examples

Page 19: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

19

Future Work

• Explore other methods for active feedback

• Develop a general framework

• Combine pseudo feedback and active feedback

Page 20: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

20

Thank you !

The End