Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan...

21
1 Carnegie Mellon Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University {yiz,callan}@cs.cmu.edu
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    1

Transcript of Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan...

1

Carnegie Mellon

Maximum Likelihood Estimation for Information Thresholding

Yi Zhang & Jamie Callan

Carnegie Mellon University

{yiz,callan}@cs.cmu.edu

2

Overview

Adaptive filtering: definition and challenges Threshold based on score distribution and the sampling

bias problem Maximum likelihood estimation for score distribution

parameters Results of Experiments Conclusion

3

Given an initial description of information needs, a filtering system sifts through a stream of documents,and delivers relevant documents to a user as soon as the document arrives. Relevance feedback maybe available for some of the delivered documents, thus user profiles can be updated adaptively.

Filtering System

Adaptive Filtering

4

Adaptive Filtering

Three major problems Learning corpus statistics, such as idf Learning user profile, such as adding or deleting key words and adjusting

term weights. (Scoring method) Learning delivery threshold. (Binary judgment)

Evaluation Measures Linear utility = r1*RR+r2*NR+r3*RN+r4*NN

Optimizing linear utility => Finding P(relevant|document)

In one dimension: P(relevant|document) = P(relevant|score) F measure

RecallPrecision

Recall*Precision12

2

)β(F

5

A Model of Score Distribution: Assumptions and Empirical Justification

Relevant:

Non-relevant:

According to other researchers, this is generally true for various statistical searching systems (scoring methods, Manmatha’s paper, Arampatzis’s paper)

2

2

2

)(

2

1)|(

uscore

erRscoreP

)cx(e)nrR|score(P

60

50

40

30

20

10

0

document score

num

ber

of d

ocum

ents

0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.5

document score

num

ber o

f doc

umen

ts

160

140

120

100

80

60

40

20

0 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50

0 0.40 0.42 0.44 0.46 0.48 0.50 0.52

document score

16

14

12

10

8

6

4

2

num

ber

of d

ocu

men

ts

0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50

120

100

80

60

40

20

document score

num

ber

of d

ocu

men

ts

Figure 1. Density of document scores: TREC9 OHSU Topic 3 and Topic 5

6

Optimize for Linear Utility Measure: from Score Distribution to Probability of Relevancy p: p(r) ratio of relevant documents

)1(**2

1

*2

1

)()|()()|(

)()|(

)(

)()|()|(

)(2

)(

2

)(

2

2

2

2

pepe

pe

nrPnrscorePrPrscoreP

rPrscoreP

scoreP

rPrscorePscorerP

cscoreuscore

uscore

7

Optimize for F Measure: From Score Distribution to Precision and Recall

dxxpdxxunormp

dxxunormp

),exp()1(),,(

),,(

)(Precision

If set threshold at θ:

dxxunorm ),,()(Recall

PR

RPF

*

**)1(maxargmaxarg

2

2*

0.4

1 0.4

2 0.43 0.44 0.45 0.46 0.47 score

0

0.01

0.02

0.03

0.04

0.04

0.06

0.07

i

non-relevant document

relevant document

8

What We Have Now?

A model for score distribution Algorithms to find the optimal threshold for different

evaluation measures given the model Learning task: find the parameters for the model?

9

Bias Problem for Parameter Estimation while Filtering

We only receive feedback for documents delivered

Parameter estimation based on random sampling assumption is biased

Sampling criteria depends on threshold, which changes over time

Solution: maximum likelihood principle, which is guaranteed to be unbiased

document score

20

40

60

80

100

120

140

Estimation based on

all relevant documents

Estimation based on documents delivered

num

ber

of d

ocum

ents

0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.5

Figure: Estimation of parameters for relevant document scores of TREC9 OHSU Topic 3 with a fixed dissemination threshold 0.4435

10

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (1)

),,,(

)),|,(log(maxarg

))|(log(maxarg

)|(maxarg

)|(maxarg),,,(

1

1

1

*****

pu

where

ScoreRScoreScoreP

DP

DP

DPpu

N

iiii

N

ii

N

ii

ML: the best estimation of parameters is the one that maximizes the probability of training data:

11

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (2)

)|(

)|(),|(

)|(

|),|,(

)|(

)|,,(

),|,(

i

iii

i

iiii

i

iii

iii

ScoreP

RPRScoreScoreP

ScoreP

RPRScoreScoreScoreP

ScoreP

RScoreScoreScoreP

ScoreRScoreScoreP

For each item inside the sum operation of the previous formula:

12

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (3)

0.4

1 0.4

2 0.43 0.44 0.45 0.46 0.47 score

0

0.01

0.02

0.03

0.04

0.04

0.06

0.07

i

non-relevant document

relevant document

)(2

)(

)*(2

)(

)1(2

1

)1(2

1

)|()1(

)|(

),|()|(

),|()|(

)|(),,,,(

2

2

2

2

cux

cxux

i

i

i

i

ii

i

i

ii

i

i

epdxep

dxepdxep

dxnrRxScorePp

dxrRxScorePp

nrRScorePnrRP

rRScorePrRP

ScorePpug

Calculating the denominator:

13

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (4)

N

ii

puLPpu

1),,,(

**** maxarg),,,(

))),,,,(/(ln(2

)(2

2

ii

i pugpuScore

LP

• For a relevant document delivered:

• For a non-relevant document delivered:

))),p,,,u(g/)p1ln(()cScore(LP iii

14

Relationship to Arampatzis’s Estimation

If no threshold exists

The previous formula becomes:

1)|(),,,,( ii ScorePpug

N

ii

puLPpu

1),,,(

**** maxarg),,,(

2

2

2

)(

uScore

LP ii

• For a relevant document delivered:

• For a non-relevant document delivered:

))1ln(()( pcScoreLP ii

Corresponding result will be the same as Arampatzis’s

15

Unbiased Estimation of Parameters Based on Maximum Likelihood Principle (5)

Optimization using conjugate gradient descent algorithm

Smoothing using conjugate prior: Prior for p: beta distribution: Prior for variance: Set:

21 )1( pp

2

2

2

005.0,001.0,001.0 21

16

Experimental Methodology (1) Optimization goal (similar to the measure used by

TREC9):T9U’=2*Relevant_Retrieved-Non_Relevant_Retrieved=2RR-NR

Corresponding rule: deliver if :

Dataset OHSUMED data (348566 articles from 1887 to 1991. 63 OHSUMED queries and

500 MeSH headings to simulate user profiles) FT data (210158 articles from Financial Times 1991 to 1994. TREC topics 351-400

to simulate user profiles)

Each profile begins with 2 relevant documents and an initial user profile

No profile updating for simplicity.

33.0)score|rR(P

17

Experimental Methodology (2)

Four runs for each profile Run1 : biased estimation of parameters because sampling bias was not considered Run3 : maximum likelihood estimation.

Both runs will stop delivering documents if the threshold is set too high, especially in the early stages of filtering. We introduced a minimum delivery ratio: If a profile has not achieved the minimum delivery ratio, its threshold will be decreased automatically:

Run 2: biased estimation + minimum delivery ratio Run 4: maximum likelihood estimation + minimum delivery ratio

Time: 21 minutes for the whole process of 63 OHSU topics on 4 years of OHSUMED data (ML algorithm)

18

Results: OHSUMED Data

  Run 1: Biased estimation

Run 2: Biased estimation+ min. delivery Ratio

Run 3: Unbiased estimation

Run4:Unbiased estimation+min. delivery ratio

OHSU topics

T9U’ utility 1.84 3.25 2.7 8.17Avg. docs. delivered per profile

3.83 9.65 5.73 18.40

Precision 0.37 0.29 0.36 0.32

Recall 0.036 0.080 0.052 0.137

MESH topics

T9U’ utility 1.89 4.28 2.44 13.10Avg. docs. delivered per profile

3.51 11.82 6.22 27.91

Precision 0.42 0.39 0.40 0.34

Recall 0.018 0.046 0.025 0.068

19

Results: Financial Times

Run 1:

Biased estimation

Run 2:

Biased estimation + min. delivery ratio

Run 3:

Unbiased estimation

Run 4:

Unbiased estimation + min. delivery ratio

T9U’ utility 1.44 -0.209 0.65 0.84Avg. docs. Delivered per profile

9.58 10.44 9.05 12.27

Precision 0.20 0.17 0.22 0.26

Recall 0.161 0.167 0.15 0.193

20

Result Analysis: Difference Between Run 4 and Run 2 on TREC9 OHSU Topics

• For most of the topics, ML (Run 4) delivered more documents than Run 2

•For some of the topics , ML (run 4) has a much higher utility than Run 2, while they are similar in most of the other topics

0 20 40 60 80-20

0

20

40

60

80

100

120

0 20 40 60 80-40

-20

0

20

40

60

80

100

Utility: ML - Biased Docs delivered:ML -Biased

Topics Topics

21

Conclusion

Score density distribution Relevant documents: normal distribution Non-relevant documents: exponential distribution

Bias problem due to non-random sampling can be solved based on the maximum likelihood principle

Significant improvement in the TREC-9 filtering task. Future work

Thresholding while updating profiles Non-random sampling problem in other task