Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst,...

24
Michael Bendersky , W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

description

Motivation Goal : retrieve more relevant documents to users Query Representation : 3 This paper term dependencies concept dependencies bag-of-words

Transcript of Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst,...

Page 1: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Michael Bendersky , W. Bruce CroftDept. of Computer Science

Univ. of Massachusetts AmherstAmherst, MA

SIGIR 2012

1

Page 2: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

• Motivation• Query Hypergraphs• Ranking Documents• Parameter estimation• Evaluation• Conclusion

2

Outline

Page 3: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Motivation• Goal : retrieve more relevant documents to

users• Query Representation :

3

This paper

term dependencies

concept dependencies

bag-of-words

Page 4: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Example • ”Provide information on the use of dogs worldwide for law enforcement purposes.”

• bag-of-word { Provide, information, dog….}• term dependency {(Provide, information ),( law, enforcement)}• concept dependency {(dog, law enforcement),..}

4

Page 5: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

• ”Provide information on the use of dogs worldwide for law enforcement

purposes.”

5

Example(cont.)

{provide, information,( law, enforcement)} {(dog, law enforcement)}

Page 6: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Model concept dependency

• Use Query Hypergraphs 1. build linguistic structure ” members of the rock group nirvana” 2. each element in the structures can be represented as a concept

6

Page 7: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Query Hypergraphs• Query Hypergraph

7

(international art crime)

D: a document

V = {D,i,a,c,ac}

E = {({i},D),({a},D),({c},D),({ac},D),({i,a,c,ac},D)}

hyperedge

Page 8: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Query Hypergraph Induction

• Three types of structures

8

• query term structure : individual query words • phrase structure : bi-gram (consider order)• proximity structure : arbitrary subsets of query terms

Page 9: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Hyperedges• Local hyperedges ({k},D)• Global hyperedge ( ,D)

9

QK

k: a conceptQK : set of query concepts

k QK

Page 10: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Ranking Documents• relevance score

10

Q: a queryD: a documente: a hyperedge E: set of hyperedges

Factor: )( ,Dkee

Page 11: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Local Factors

11

)(k : the importance weight of the concept k

: a matching function between the concept k and the document D

Page 12: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Matching Function

12

DCCktfDktf

Dkf

),(),(log),(

C: the collectionD

C

: the number of term in the document

: the number of term in the collection

: Dirichlet smoothing parameter

Page 13: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

• consider the dependency between the entire set of query concepts

13

Global Factor

: the highest score passage from the document

The dependency range is much longer for concept dependencies.

),( QKk : the importance weight of concept k in the context of the entire set of query concepts QK (with the concept in the passage )

Page 14: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Example

14

{(dog, law enforcement)}

Don’t appear in the same sentence, but co-occurrence in a largertext passage.

Page 15: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Query Hypergraph Parameterization

• Goal: parameterize concept weights (local & global)

15

)(k ),( QKk

• Parameterization By Structure• Parameterization By Concept

Page 16: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Parameterization By Structure

16

: a structure

Page 17: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

• parameterize the concept weights based on the concepts themselves

17

Parameterization By Concept

concept importance feature

estimation

Page 18: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Parameter Estimation• optimize a target metric (mean average

precision)• rely on a large collection• use coordinate ascent algorithm - a coordinate-level hill climbing search• repeatedly cycles through each of

parameters , while holding all other parameters fixed

18

)(

Page 19: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

19

Parameter Estimation(cont.)

Optimize the local component (the weight ))(k

retrieve top thousand documents

optimize the global component (the weight )),( QKk

Page 20: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Parameter Estimation(cont.)

20

(Robust04 collection)

Page 21: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Evaluation(testing)• search engine - Indri • test collections

• query

21

Page 22: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Evaluation(evaluation metric)• MAP(mean average precision)

ex. Topic 1 : 3 個相關 (order: 1,3,5) (1/1+2/3+3/5)/3

• ERR@k (expected reciprocal rank, k=20)

22

1

11

))(1()( k

jj

k

i

i gRigR g= 0,1,2,3,4

R(g)=(2^g-1)/16

satisfied by doc k

not satisfied with previous doc (1~k-1)

Page 23: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Evaluation(retrieval performance)

23

Page 24: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1.

Conclusion• model arbitrary term dependencies as

concepts• uses passage-level evidence to model the

dependencies between the concepts • assign weight to both concepts and

concept dependencies• The proposed retrieval framework

improves the retrieval effectiveness for verbose natural queries.

24