Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval...

9
5/21/17 1 Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and Pandu Nayak Introduction to Information Retrieval Introduction to Information Retrieval Summary – BIM [Robertson & Spärck-Jones 1976] § Boils down to where § With constant p i = 0.5, simplifies to IDF weighting: RSV = log N n i x i =q i =1 RSV BIM = c i BIM ; x i =q i =1 c i BIM = log p i (1 r i ) (1 p i )r i document relevant (R=1) not relevant (R=0) term present x i =1 p i r i term absent x i =0 (1 p i ) (1 r i ) Log odds ratio Introduction to Information Retrieval Graphical model for BIM – Bernoulli NB i q Binary variables x i = (tf i 0) Introduction to Information Retrieval A key limitation of the BIM § BIM – like much of original IR – was designed for titles or abstracts, and not for modern full text search § We want to pay attention to term frequency and document lengths, just like in other models we discuss § Want § Want some model of how often terms occur in docs c i = log p tf r 0 p 0 r tf Introduction to Information Retrieval 1. Okapi BM25 [Robertson et al. 1994, TREC City U.] § BM25 “Best Match 25” (they had a bunch of tries!) § Developed in the context of the Okapi system § Started to be increasingly adopted by other teams during the TREC competitions § It works well § Goal: be sensitive to term frequency and document length while not adding too many parameters § (Robertson and Zaragoza 2009; Spärck Jones et al. 2000)

Transcript of Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval...

Page 1: Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and

5/21/17

1

IntroductiontoInformationRetrieval

Introductionto

InformationRetrieval

BM25,BM25F,andUserBehaviorChrisManningandPanduNayak

IntroductiontoInformationRetrieval

IntroductiontoInformationRetrieval

Summary– BIM[Robertson&Spärck-Jones1976]§ Boilsdownto

where

§ Withconstantpi =0.5,simplifiestoIDFweighting:

RSV = log Nnixi=qi=1

RSV BIM = ciBIM ;

xi=qi=1∑ ci

BIM = log pi (1− ri )(1− pi )ri

document relevant(R=1) notrelevant(R=0)

termpresent xi =1 pi ritermabsent xi =0 (1– pi) (1 – ri)

Logoddsratio

IntroductiontoInformationRetrieval

GraphicalmodelforBIM– BernoulliNB

i ∈ q

Binaryvariablesxi = (tfi ≠ 0)

IntroductiontoInformationRetrieval

AkeylimitationoftheBIM§ BIM– likemuchoforiginalIR– wasdesignedfortitlesorabstracts,andnotformodernfulltextsearch

§ Wewanttopayattentiontotermfrequencyanddocumentlengths,justlikeinothermodelswediscuss

§ Want

§ Wantsomemodelofhowoftentermsoccurindocs

ci = logptf r0p0rtf

IntroductiontoInformationRetrieval

1.OkapiBM25 [Robertsonetal.1994,TRECCityU.]

§ BM25“BestMatch25”(theyhadabunchoftries!)§ DevelopedinthecontextoftheOkapisystem§ StartedtobeincreasinglyadoptedbyotherteamsduringtheTRECcompetitions

§ Itworkswell

§ Goal:besensitivetotermfrequencyanddocumentlengthwhilenotaddingtoomanyparameters§ (RobertsonandZaragoza2009;Spärck Jonesetal.2000)

Page 2: Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and

5/21/17

2

IntroductiontoInformationRetrieval

§ Wordsaredrawnindependentlyfromthevocabularyusingamultinomialdistribution

Generativemodelfordocuments

... the draft is that each team is given a position in the draft …

basic

team each

thatof

is

the draft

designnfl

football

given

annual draftfootball

team

nfl

IntroductiontoInformationRetrieval

§ Distributionoftermfrequencies(tf)followsabinomialdistribution– approximatedbyaPoisson

Generativemodelfordocuments

... the draft is that each team is given a position in the draft …

draft

IntroductiontoInformationRetrieval

Poissondistribution§ ThePoissondistributionmodelstheprobabilityofk,thenumberofeventsoccurringinafixedintervaloftime/space,withknownaveragerateλ (=cf/T),independentofthelastevent

§ Examples§ Numberofcarsarrivingatthetollboothperminute§ Numberoftyposonapage

p(k) = λk

k!e−λ

IntroductiontoInformationRetrieval

Poissondistribution§ IfTislargeandpissmall,wecanapproximateabinomialdistributionwithaPoissonwhereλ =Tp

§ Mean=Variance=λ =Tp.§ Examplep= 0.08,T=20.Chanceof1occurrenceis:

§ Binomial

§ Poisson…alreadyclose

p(k) = λk

k!e−λ

P(1) = [(20)(.08)]1

1!e−(20)(.08) = 1.6

1e−1.6 = 0.3230

P(1) = 201

!

"#

$

%&(.08)1(.92)19 = .3282

IntroductiontoInformationRetrieval

Poissonmodel§ Assumethattermfrequenciesinadocument(tfi)followaPoissondistribution§ “Fixedinterval”impliesfixeddocumentlength…thinkroughlyconstant-sizeddocumentabstracts§…willfixlater

IntroductiontoInformationRetrieval

Poissondistributions

Page 3: Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and

5/21/17

3

IntroductiontoInformationRetrieval

(One)PoissonModel§ Isareasonablefitfor“general”words§ Isapoorfitfortopic-specificwords

§ gethigherp(k)thanpredictedtoooften

Documents containingk occurrencesofword(λ =53/650)Freq Word 0 1 2 3 4 5 6 7 8 9 10 11 1253 expected 599 49 252 based 600 48 2

53 conditions 604 39 7

55 cathexis 619 22 3 2 1 2 0 151 comic 642 3 0 1 0 0 0 0 0 0 1 1 2

Harter, “A Probabilistic Approach to Automatic Keyword Indexing”, JASIST, 1975

IntroductiontoInformationRetrieval

Eliteness(“aboutness”)§ Modeltermfrequenciesusingeliteness§ Whatiseliteness?

§ Hiddenvariableforeachdocument-termpair,denotedasEi fortermi

§ Representsaboutness:atermiseliteinadocumentif,insomesense,thedocumentisabouttheconceptdenotedbytheterm

§ Elitenessisbinary§ Termoccurrencesdependonlyoneliteness…§ …butelitenessdependsonrelevance

IntroductiontoInformationRetrieval

ElitetermsTextfromtheWikipediapageontheNFLdraftshowingeliteterms

The National Football League Draft is an annual event in which the National Football League (NFL) teams select eligible college football players. It serves as the league’s most common source of player recruitment. The basic design of the draft is that each team is given a position in the draft order in reverse order relative to its record …

IntroductiontoInformationRetrieval

Graphicalmodelwitheliteness

i ∈ q

Frequencies(notbinary)

Binaryvariables

IntroductiontoInformationRetrieval

RetrievalStatusValue§ SimilartotheBIMderivation,wehave

where

andusingeliteness,wehave:

RSV elite = cielite

i∈q,tfi>0∑ (tfi );

p(TFi = tfi R) = p(TFi = tfi Ei = elite)p(Ei = elite R)+p(TFi = tfi Ei = elite)(1− p(Ei = elite R))

cielite(tfi ) = log

p(TFi = tfi R =1)p(TFi = 0 R = 0)p(TFi = 0 R =1)p(TFi = tfi R = 0)

IntroductiontoInformationRetrieval

2-Poissonmodel§ Theproblemswiththe1-PoissonmodelsuggestsfittingtwoPoissondistributions

§ Inthe“2-Poissonmodel”,thedistributionisdifferentdependingonwhetherthetermiseliteornot

§ whereπisprobabilitythatdocumentiseliteforterm§ but,unfortunately,wedon’tknowπ,λ,μ

p(TFi = ki R) =

Page 4: Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and

5/21/17

4

IntroductiontoInformationRetrieval

Let’sgetanidea:Graphingfordifferentparametervaluesofthe2-Poisson

cielite(tfi )

IntroductiontoInformationRetrieval

Qualitativeproperties

§

§ increasesmonotonicallywithtfi

§ …butasymptoticallyapproachesamaximumvalueas[nottrueforsimplescalingoftf]

§ … withtheasymptoticlimitbeing

cielite(0) = 0

cielite(tfi )

ciBIM

Weightofelitenessfeature

tfi →∞

IntroductiontoInformationRetrieval

Approximatingthesaturationfunction§ Estimatingparametersforthe2-Poissonmodelisnoteasy

§ …Soapproximateitwithasimpleparametriccurvethathasthesamequalitativeproperties

tfk1 + tf

IntroductiontoInformationRetrieval

Saturationfunction

§ Forhighvaluesofk1,incrementsintfi continuetocontributesignificantlytothescore

§ Contributionstailoffquicklyforlowvaluesofk1

IntroductiontoInformationRetrieval

“Early”versionsofBM25§ Version1:usingthesaturationfunction

§ Version2:BIMsimplificationtoIDF

§ (k1+1) factordoesn’tchangeranking,butmakestermscore1whentfi = 1

§ Similartotf-idf,buttermscoresarebounded

ciBM 25v1(tfi ) = ci

BIM tfik1 + tfi

ciBM 25v2 (tfi ) = log

Ndfi

×(k1 +1)tfik1 + tfi

IntroductiontoInformationRetrieval

Documentlengthnormalization§ Longerdocumentsarelikelytohavelargertfi values

§ Whymightdocumentsbelonger?§ Verbosity:suggestsobservedtfi toohigh§ Largerscope:suggestsobservedtfi mayberight

§ Arealdocumentcollectionprobablyhasbotheffects§ …soshouldapplysomekindofpartialnormalization

Page 5: Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and

5/21/17

5

IntroductiontoInformationRetrieval

Documentlengthnormalization§ Documentlength:

§ avdl:Averagedocumentlengthovercollection§ Lengthnormalizationcomponent

§ b = 1 fulldocumentlengthnormalization§ b = 0 nodocumentlengthnormalization

dl = tfii∈V∑

B = (1− b)+ b dlavdl

"

#$

%

&', 0 ≤ b ≤1

IntroductiontoInformationRetrieval

Documentlengthnormalization

IntroductiontoInformationRetrieval

OkapiBM25§ Normalizetf usingdocumentlength

§ BM25rankingfunction

t !fi =tfiB

ciBM 25(tfi ) = log

Ndfi

×(k1 +1)t "fik1 + t "fi

= log Ndfi

×(k1 +1)tfi

k1((1− b)+ bdlavdl

)+ tfi

RSV BM 25 = ciBM 25

i∈q∑ (tfi );

IntroductiontoInformationRetrieval

OkapiBM25

§ k1 controlstermfrequencyscaling§ k1 = 0 isbinarymodel;k1 largeisrawtermfrequency

§ b controlsdocumentlengthnormalization§ b = 0 isnolengthnormalization;b = 1 isrelativefrequency(fullyscalebydocumentlength)

§ Typically,k1 issetaround1.2–2andb around0.75§ IIRsec.11.4.3discussesincorporatingquerytermweightingand(pseudo)relevancefeedback

RSV BM 25 = log Ndfii∈q

∑ ⋅(k1 +1)tfi

k1((1− b)+ bdlavdl

)+ tfi

IntroductiontoInformationRetrieval

WhyisBM25betterthanVSMtf-idf?§ Supposeyourqueryis[machinelearning]§ Supposeyouhave2documentswithtermcounts:

§ doc1:learning1024;machine1§ doc2:learning16;machine8

§ tf-idf:log2 tf *log2 (N/df)§ doc1:11*7+1*10 = 87§ doc2:5*7+4*10=75

§ BM25:k1 =2§ doc1:7*3+10*1=31§ doc2:7*2.67+10*2.4=42.7

IntroductiontoInformationRetrieval

2.Rankingwithfeatures§ Textualfeatures

§ Zones:Title,author,abstract,body,anchors,…§ Proximity§ …

§ Non-textualfeatures§ Filetype§ Fileage§ Pagerank§ …

Page 6: Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and

5/21/17

6

IntroductiontoInformationRetrieval

Rankingwithzones§ Straightforwardidea:

§ Applyyourfavoriterankingfunction(BM25)toeachzoneseparately

§ Combinezonescoresusingaweightedlinearcombination

§ Butthatseemstoimplythattheelitenesspropertiesofdifferentzonesaredifferentandindependentofeachother§ …whichseemsunreasonable

IntroductiontoInformationRetrieval

Rankingwithzones§ Alternateidea

§ Assumeelitenessisaterm/documentpropertysharedacrosszones

§ …buttherelationshipbetweenelitenessandtermfrequenciesarezone-dependent§ e.g.,denseruseofelitetopicwordsintitle

§ Consequence§ Firstcombineevidenceacrosszonesforeachterm§ Thencombineevidenceacrossterms

IntroductiontoInformationRetrieval

BM25Fwithzones§ Calculateaweightedvariantoftotaltermfrequency§ …andaweightedvariantofdocumentlength

wherevz iszoneweighttfzi istermfrequencyinzonezlenz islengthofzonezZ isthenumberofzones

tfi = vztfziz=1

Z

∑ dl = vzlenzz=1

Z

∑ avdl = Averageacrossalldocuments

dl

IntroductiontoInformationRetrieval

SimpleBM25Fwithzones

§ Simpleinterpretation:zonez is“replicated”vz times

§ Butwemaywantzone-specificparameters(k1, b,IDF)

RSV SimpleBM 25F = log Ndfii∈q

∑ ⋅(k1 +1)tfi

k1((1− b)+ bdlavdl

)+ tfi

IntroductiontoInformationRetrieval

BM25F§ Empirically,zone-specificlengthnormalization(i.e.,zone-specificb)hasbeenfoundtobeuseful

tfi = vztfziBzz=1

Z

Bz = (1− bz )+ bzlenzavlenz

"

#$

%

&', 0 ≤ bz ≤1

RSV BM 25F = log Ndfii∈q

∑ ⋅(k1 +1)tfik1 + tfi

See Robertson and Zaragoza (2009: 364)

IntroductiontoInformationRetrieval

Rankingwithnon-textualfeatures§ Assumptions

§ Usualindependenceassumption§ Independentofeachotherandofthetextualfeatures§ AllowsustofactoroutinBIM-stylederivation

§ Relevanceinformationisqueryindependent§ Usuallytrueforfeatureslikepagerank,age,type,…§ Allowsustokeepallnon-textualfeaturesintheBIM-stylederivationwherewedropnon-queryterms

p(Fj = f j R =1)p(Fj = f j R = 0)

Page 7: Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and

5/21/17

7

IntroductiontoInformationRetrieval

Rankingwithnon-textualfeatures

where

andisanartificiallyaddedfreeparametertoaccountforrescalings intheapproximations§ CaremustbetakeninselectingVj dependingonFj.E.g.

§ Explainswhyworkswell

RSV = cii∈q∑ (tfi )+ λ jVj ( f j )

j=1

F

Vj ( f j ) = logp(Fj = f j R =1)p(Fj = f j R = 0)

λ j

log( !λ j + f j )f j!λ j + f j

1!λ j + exp(− f j !!λ j )

RSV BM 25 + log(pagerank)

IntroductiontoInformationRetrieval

UserBehavior§ Search Results for “CIKM” (in 2010!)

38

# of clicks received

Taken with slight adaptation from Fan Guo and Chao Liu’s 2009/2010 CIKM tutorial: Statistical Models for Web Search: Click Log Analysis

IntroductiontoInformationRetrieval

UserBehavior§ Adapt ranking to user clicks?

39

# of clicks received

IntroductiontoInformationRetrieval

UserBehavior§ Tools needed for non-trivial cases

40

# of clicks received

IntroductiontoInformationRetrieval

Websearchclicklog

An example

41

IntroductiontoInformationRetrieval

WebSearchClickLog§ How large is the click log?

§ search logs: 10+ TB/day

§ In existing publications:§ [Silverstein+99]: 285M sessions§ [Craswell+08]: 108k sessions§ [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions)§ [Guo +09a] : 8.8M sessions from 110k unique queries§ [Guo+09b]: 8.8M sessions from 110k unique queries§ [Chapelle+09]: 58M sessions from 682k unique queries§ [Liu+09a]: 0.26PB data from 103M unique queries

42

Page 8: Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and

5/21/17

8

IntroductiontoInformationRetrieval

InterpretClicks:anExample

§ Clicks are good…§ Are these two clicks

equally “good”?

§ Non-clicks may have excuses:§ Not relevant§ Not examined

43

IntroductiontoInformationRetrieval

Eye-trackingUserStudy

44

IntroductiontoInformationRetrieval

§ Higher positions receive more user attention (eye fixation) and clicks than lower positions.

§ This is true even in the extreme setting where the order of positions is reversed.

§ “Clicks are informative but biased”.

45

[Joachims+07]

ClickPosition-bias

NormalPosition

Percen

tage

ReversedImpression

Percen

tage

IntroductiontoInformationRetrieval

Userbehavior§ Userbehaviorisanintriguingsourceofrelevancedata

§ Usersmake(somewhat)informedchoiceswhentheyinteractwithsearchengines

§ Potentiallyalotofdataavailableinsearchlogs

§ Buttherearesignificantcaveats§ Userbehaviordatacanbeverynoisy§ Interpretinguserbehaviorcanbetricky§ Spamcanbeasignificantproblem§ Notallquerieswillhaveuserbehavior

IntroductiontoInformationRetrieval

FeaturesbasedonuserbehaviorFrom[Agichtein,Brill,Dumais 2006;Joachims 2002]§ Click-throughfeatures

§ Clickfrequency,clickprobability,clickdeviation§ Clickonnextresult?previous result?above?below>?

§ Browsingfeatures§ Cumulativeandaveragetimeonpage,ondomain,onURLprefix;deviationfromaveragetimes

§ Browsepathfeatures§ Query-textfeatures

§ Queryoverlapwithtitle,snippet,URL,domain,nextquery§ Querylength

IntroductiontoInformationRetrieval

Incorporatinguserbehaviorintorankingalgorithm§ IncorporateuserbehaviorfeaturesintoarankingfunctionlikeBM25F§ ButrequiresanunderstandingofuserbehaviorfeaturessothatappropriateVj functionsareused

§ Incorporateuserbehaviorfeaturesintolearnedrankingfunction

§ Eitherofthesewaysofincorporatinguserbehaviorsignalsimproveranking

Page 9: Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval Introduction to Information Retrieval BM25, BM25F, and User Behavior Chris Manning and

5/21/17

9

IntroductiontoInformationRetrieval

Resources§ S.E.RobertsonandH.Zaragoza.2009.TheProbabilistic

RelevanceFramework:BM25andBeyond.FoundationsandTrendsinInformationRetrieval 3(4):333-389.

§ K.Spärck Jones,S.Walker,andS.E.Robertson.2000.Aprobabilisticmodelofinformationretrieval:Developmentandcomparativeexperiments.Part1.InformationProcessingandManagement779–808.

§ T.Joachims.OptimizingSearchEnginesusingClickthroughData.2002.SIGKDD.

§ E.Agichtein,E.Brill,S.Dumais.2006.ImprovingWebSearchRankingByIncorporatingUserBehaviorInformation.2006.SIGIR.