Information Retrieval

5/21/17

IntroductiontoInformationRetrieval

Introductionto

InformationRetrieval

BM25,BM25F,andUserBehaviorChrisManningandPanduNayak

Summary– BIM[Robertson&Spärck-Jones1976]§ Boilsdownto

§ Withconstantpi =0.5,simplifiestoIDFweighting:

RSV = log Nnixi=qi=1

RSV BIM = ciBIM ;

xi=qi=1∑ ci

BIM = log pi (1− ri )(1− pi )ri

document relevant(R=1) notrelevant(R=0)

termpresent xi =1 pi ritermabsent xi =0 (1– pi) (1 – ri)

Logoddsratio

GraphicalmodelforBIM– BernoulliNB

i ∈ q

Binaryvariablesxi = (tfi ≠ 0)

AkeylimitationoftheBIM§ BIM– likemuchoforiginalIR– wasdesignedfortitlesorabstracts,andnotformodernfulltextsearch

§ Wewanttopayattentiontotermfrequencyanddocumentlengths,justlikeinothermodelswediscuss

§ Want

§ Wantsomemodelofhowoftentermsoccurindocs

ci = logptf r0p0rtf

1.OkapiBM25 [Robertsonetal.1994,TRECCityU.]

§ BM25“BestMatch25”(theyhadabunchoftries!)§ DevelopedinthecontextoftheOkapisystem§ StartedtobeincreasinglyadoptedbyotherteamsduringtheTRECcompetitions

§ Itworkswell

§ Goal:besensitivetotermfrequencyanddocumentlengthwhilenotaddingtoomanyparameters§ (RobertsonandZaragoza2009;Spärck Jonesetal.2000)

5/21/17

§ Wordsaredrawnindependentlyfromthevocabularyusingamultinomialdistribution

Generativemodelfordocuments

... the draft is that each team is given a position in the draft …

team each

thatof

the draft

designnfl

football

annual draftfootball

§ Distributionoftermfrequencies(tf)followsabinomialdistribution– approximatedbyaPoisson

Generativemodelfordocuments

... the draft is that each team is given a position in the draft …

Poissondistribution§ ThePoissondistributionmodelstheprobabilityofk,thenumberofeventsoccurringinafixedintervaloftime/space,withknownaveragerateλ (=cf/T),independentofthelastevent

§ Examples§ Numberofcarsarrivingatthetollboothperminute§ Numberoftyposonapage

p(k) = λk

k!e−λ

Poissondistribution§ IfTislargeandpissmall,wecanapproximateabinomialdistributionwithaPoissonwhereλ =Tp

§ Mean=Variance=λ =Tp.§ Examplep= 0.08,T=20.Chanceof1occurrenceis:

§ Binomial

§ Poisson…alreadyclose

p(k) = λk

k!e−λ

P(1) = [(20)(.08)]1

1!e−(20)(.08) = 1.6

1e−1.6 = 0.3230

P(1) = 201

%&(.08)1(.92)19 = .3282

Poissonmodel§ Assumethattermfrequenciesinadocument(tfi)followaPoissondistribution§ “Fixedinterval”impliesfixeddocumentlength…thinkroughlyconstant-sizeddocumentabstracts§…willfixlater

Poissondistributions

5/21/17

(One)PoissonModel§ Isareasonablefitfor“general”words§ Isapoorfitfortopic-specificwords

§ gethigherp(k)thanpredictedtoooften

Documents containingk occurrencesofword(λ =53/650)Freq Word 0 1 2 3 4 5 6 7 8 9 10 11 1253 expected 599 49 252 based 600 48 2

53 conditions 604 39 7

55 cathexis 619 22 3 2 1 2 0 151 comic 642 3 0 1 0 0 0 0 0 0 1 1 2

Harter, “A Probabilistic Approach to Automatic Keyword Indexing”, JASIST, 1975

Eliteness(“aboutness”)§ Modeltermfrequenciesusingeliteness§ Whatiseliteness?

§ Hiddenvariableforeachdocument-termpair,denotedasEi fortermi

§ Representsaboutness:atermiseliteinadocumentif,insomesense,thedocumentisabouttheconceptdenotedbytheterm

§ Elitenessisbinary§ Termoccurrencesdependonlyoneliteness…§ …butelitenessdependsonrelevance

ElitetermsTextfromtheWikipediapageontheNFLdraftshowingeliteterms

The National Football League Draft is an annual event in which the National Football League (NFL) teams select eligible college football players. It serves as the league’s most common source of player recruitment. The basic design of the draft is that each team is given a position in the draft order in reverse order relative to its record …

Graphicalmodelwitheliteness

i ∈ q

Frequencies(notbinary)

Binaryvariables

RetrievalStatusValue§ SimilartotheBIMderivation,wehave

andusingeliteness,wehave:

RSV elite = cielite

i∈q,tfi>0∑ (tfi );

p(TFi = tfi R) = p(TFi = tfi Ei = elite)p(Ei = elite R)+p(TFi = tfi Ei = elite)(1− p(Ei = elite R))

cielite(tfi ) = log

p(TFi = tfi R =1)p(TFi = 0 R = 0)p(TFi = 0 R =1)p(TFi = tfi R = 0)

2-Poissonmodel§ Theproblemswiththe1-PoissonmodelsuggestsfittingtwoPoissondistributions

§ Inthe“2-Poissonmodel”,thedistributionisdifferentdependingonwhetherthetermiseliteornot

§ whereπisprobabilitythatdocumentiseliteforterm§ but,unfortunately,wedon’tknowπ,λ,μ

p(TFi = ki R) =

5/21/17

Let’sgetanidea:Graphingfordifferentparametervaluesofthe2-Poisson

cielite(tfi )

Qualitativeproperties

§ increasesmonotonicallywithtfi

§ …butasymptoticallyapproachesamaximumvalueas[nottrueforsimplescalingoftf]

§ … withtheasymptoticlimitbeing

cielite(0) = 0

cielite(tfi )

Weightofelitenessfeature

tfi →∞

Approximatingthesaturationfunction§ Estimatingparametersforthe2-Poissonmodelisnoteasy

§ …Soapproximateitwithasimpleparametriccurvethathasthesamequalitativeproperties

tfk1 + tf

Saturationfunction

§ Forhighvaluesofk1,incrementsintfi continuetocontributesignificantlytothescore

§ Contributionstailoffquicklyforlowvaluesofk1

“Early”versionsofBM25§ Version1:usingthesaturationfunction

§ Version2:BIMsimplificationtoIDF

§ (k1+1) factordoesn’tchangeranking,butmakestermscore1whentfi = 1

§ Similartotf-idf,buttermscoresarebounded

ciBM 25v1(tfi ) = ci

BIM tfik1 + tfi

ciBM 25v2 (tfi ) = log

×(k1 +1)tfik1 + tfi

Documentlengthnormalization§ Longerdocumentsarelikelytohavelargertfi values

§ Whymightdocumentsbelonger?§ Verbosity:suggestsobservedtfi toohigh§ Largerscope:suggestsobservedtfi mayberight

§ Arealdocumentcollectionprobablyhasbotheffects§ …soshouldapplysomekindofpartialnormalization

5/21/17

Documentlengthnormalization§ Documentlength:

§ avdl:Averagedocumentlengthovercollection§ Lengthnormalizationcomponent

§ b = 1 fulldocumentlengthnormalization§ b = 0 nodocumentlengthnormalization

dl = tfii∈V∑

B = (1− b)+ b dlavdl

&', 0 ≤ b ≤1

Documentlengthnormalization

OkapiBM25§ Normalizetf usingdocumentlength

§ BM25rankingfunction

t !fi =tfiB

ciBM 25(tfi ) = log

×(k1 +1)t "fik1 + t "fi

= log Ndfi

×(k1 +1)tfi

k1((1− b)+ bdlavdl

)+ tfi

RSV BM 25 = ciBM 25

i∈q∑ (tfi );

OkapiBM25

§ k1 controlstermfrequencyscaling§ k1 = 0 isbinarymodel;k1 largeisrawtermfrequency

§ b controlsdocumentlengthnormalization§ b = 0 isnolengthnormalization;b = 1 isrelativefrequency(fullyscalebydocumentlength)

§ Typically,k1 issetaround1.2–2andb around0.75§ IIRsec.11.4.3discussesincorporatingquerytermweightingand(pseudo)relevancefeedback

RSV BM 25 = log Ndfii∈q

∑ ⋅(k1 +1)tfi

)+ tfi

WhyisBM25betterthanVSMtf-idf?§ Supposeyourqueryis[machinelearning]§ Supposeyouhave2documentswithtermcounts:

§ doc1:learning1024;machine1§ doc2:learning16;machine8

§ tf-idf:log2 tf *log2 (N/df)§ doc1:11*7+1*10 = 87§ doc2:5*7+4*10=75

§ BM25:k1 =2§ doc1:7*3+10*1=31§ doc2:7*2.67+10*2.4=42.7

2.Rankingwithfeatures§ Textualfeatures

§ Zones:Title,author,abstract,body,anchors,…§ Proximity§ …

§ Non-textualfeatures§ Filetype§ Fileage§ Pagerank§ …

5/21/17

Rankingwithzones§ Straightforwardidea:

§ Applyyourfavoriterankingfunction(BM25)toeachzoneseparately

§ Combinezonescoresusingaweightedlinearcombination

§ Butthatseemstoimplythattheelitenesspropertiesofdifferentzonesaredifferentandindependentofeachother§ …whichseemsunreasonable

Rankingwithzones§ Alternateidea

§ Assumeelitenessisaterm/documentpropertysharedacrosszones

§ …buttherelationshipbetweenelitenessandtermfrequenciesarezone-dependent§ e.g.,denseruseofelitetopicwordsintitle

§ Consequence§ Firstcombineevidenceacrosszonesforeachterm§ Thencombineevidenceacrossterms

BM25Fwithzones§ Calculateaweightedvariantoftotaltermfrequency§ …andaweightedvariantofdocumentlength

wherevz iszoneweighttfzi istermfrequencyinzonezlenz islengthofzonezZ isthenumberofzones

tfi = vztfziz=1

∑ dl = vzlenzz=1

∑ avdl = Averageacrossalldocuments

SimpleBM25Fwithzones

§ Simpleinterpretation:zonez is“replicated”vz times

§ Butwemaywantzone-specificparameters(k1, b,IDF)

RSV SimpleBM 25F = log Ndfii∈q

∑ ⋅(k1 +1)tfi

)+ tfi

BM25F§ Empirically,zone-specificlengthnormalization(i.e.,zone-specificb)hasbeenfoundtobeuseful

tfi = vztfziBzz=1

Bz = (1− bz )+ bzlenzavlenz

&', 0 ≤ bz ≤1

RSV BM 25F = log Ndfii∈q

∑ ⋅(k1 +1)tfik1 + tfi

See Robertson and Zaragoza (2009: 364)

Rankingwithnon-textualfeatures§ Assumptions

§ Usualindependenceassumption§ Independentofeachotherandofthetextualfeatures§ AllowsustofactoroutinBIM-stylederivation

§ Relevanceinformationisqueryindependent§ Usuallytrueforfeatureslikepagerank,age,type,…§ Allowsustokeepallnon-textualfeaturesintheBIM-stylederivationwherewedropnon-queryterms

p(Fj = f j R =1)p(Fj = f j R = 0)

5/21/17

Rankingwithnon-textualfeatures

andisanartificiallyaddedfreeparametertoaccountforrescalings intheapproximations§ CaremustbetakeninselectingVj dependingonFj.E.g.

§ Explainswhyworkswell

RSV = cii∈q∑ (tfi )+ λ jVj ( f j )

Vj ( f j ) = logp(Fj = f j R =1)p(Fj = f j R = 0)

log( !λ j + f j )f j!λ j + f j

1!λ j + exp(− f j !!λ j )

RSV BM 25 + log(pagerank)

UserBehavior§ Search Results for “CIKM” (in 2010!)

# of clicks received

Taken with slight adaptation from Fan Guo and Chao Liu’s 2009/2010 CIKM tutorial: Statistical Models for Web Search: Click Log Analysis

UserBehavior§ Adapt ranking to user clicks?

UserBehavior§ Tools needed for non-trivial cases

Websearchclicklog

An example

WebSearchClickLog§ How large is the click log?

§ search logs: 10+ TB/day

§ In existing publications:§ [Silverstein+99]: 285M sessions§ [Craswell+08]: 108k sessions§ [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions)§ [Guo +09a] : 8.8M sessions from 110k unique queries§ [Guo+09b]: 8.8M sessions from 110k unique queries§ [Chapelle+09]: 58M sessions from 682k unique queries§ [Liu+09a]: 0.26PB data from 103M unique queries

5/21/17

InterpretClicks:anExample

§ Clicks are good…§ Are these two clicks

equally “good”?

§ Non-clicks may have excuses:§ Not relevant§ Not examined

Eye-trackingUserStudy

§ Higher positions receive more user attention (eye fixation) and clicks than lower positions.

§ This is true even in the extreme setting where the order of positions is reversed.

§ “Clicks are informative but biased”.

[Joachims+07]

ClickPosition-bias

NormalPosition

Percen

ReversedImpression

Percen

Userbehavior§ Userbehaviorisanintriguingsourceofrelevancedata

§ Usersmake(somewhat)informedchoiceswhentheyinteractwithsearchengines

§ Potentiallyalotofdataavailableinsearchlogs

§ Buttherearesignificantcaveats§ Userbehaviordatacanbeverynoisy§ Interpretinguserbehaviorcanbetricky§ Spamcanbeasignificantproblem§ Notallquerieswillhaveuserbehavior

FeaturesbasedonuserbehaviorFrom[Agichtein,Brill,Dumais 2006;Joachims 2002]§ Click-throughfeatures

§ Clickfrequency,clickprobability,clickdeviation§ Clickonnextresult?previous result?above?below>?

§ Browsingfeatures§ Cumulativeandaveragetimeonpage,ondomain,onURLprefix;deviationfromaveragetimes

§ Browsepathfeatures§ Query-textfeatures

§ Queryoverlapwithtitle,snippet,URL,domain,nextquery§ Querylength

Incorporatinguserbehaviorintorankingalgorithm§ IncorporateuserbehaviorfeaturesintoarankingfunctionlikeBM25F§ ButrequiresanunderstandingofuserbehaviorfeaturessothatappropriateVj functionsareused

§ Incorporateuserbehaviorfeaturesintolearnedrankingfunction

§ Eitherofthesewaysofincorporatinguserbehaviorsignalsimproveranking

5/21/17

Resources§ S.E.RobertsonandH.Zaragoza.2009.TheProbabilistic

RelevanceFramework:BM25andBeyond.FoundationsandTrendsinInformationRetrieval 3(4):333-389.

§ K.Spärck Jones,S.Walker,andS.E.Robertson.2000.Aprobabilisticmodelofinformationretrieval:Developmentandcomparativeexperiments.Part1.InformationProcessingandManagement779–808.

§ T.Joachims.OptimizingSearchEnginesusingClickthroughData.2002.SIGKDD.

§ E.Agichtein,E.Brill,S.Dumais.2006.ImprovingWebSearchRankingByIncorporatingUserBehaviorInformation.2006.SIGIR.

Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval...

Documents

Transcript of Information Retrieval - Stanford University · 5/21/17 1 Introduction to Information Retrieval...

Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval…

Introduction to Information Retrieval Introduction to Information Retrieval Evaluation.

Introduction to Information Retrieval - Stanford University...Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search

week 10 Information retrieval presentationlsir retrieval...Information Retrieval and Data Mining Part 1 – Information Retrieval. 2 ©2006/7, Karl Aberer, ... "web information retrieval"

Using BM25F for Semantic Search

Multimedia Information Retrieval -Slidesdcalab.unipv.it/wp-content/uploads/2015/02/video_search.pdf · Multimedia Information Retrieval. Multimedia Information Retrieval Motivation

Introduction*to Information*Retrieval · Introduction*to*Information*Retrieval Introduction*to Information*Retrieval CS276:*Information*Retrieval*and*Web*Search Pandu*Nayak*and*Prabhakar*Raghavan

Introduction to Information Retrieval XML Retrieval.

CS336: Intelligent Information Retrieval Why is Information Retrieval difficult?

Information Retrieval · Information Retrieval vs. Relational Databases § Information Retrieval(in contrast) § datais mostly unstructuredwith no precise semantics § information

Introduction to Information Retrieval Information Retrieval Models

Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information.

BM25, BM25F, and User Behavior Chris Manning and Pandu Nayak

Multimedia Information Retrieval -Slidesmhd/8337sp09/mir.pdfMultimedia Information RetrievalMultimedia Information Retrieval Text-based Information Retrieval ¾Too many images to annotate

Probabilistic Models in Information Retrieval SI650: Information Retrieval

Introduction to Information Retrieval Information ...zhaojin/cs3245_2018/w11.pdf · Introduction to Information Retrieval CS3245 Information Retrieval Lecture 11: Probabilistic IR

Introduction to Information Retrieval - cis.csuohio.educis.csuohio.edu/~sschung/cis612/Lecture_Intro_IR_PhrasePositioning.pdf · Information Retrieval • Information Retrieval (IR)

Introductionto InformationRetrieval · IntroductiontoInformationRetrieval Introductionto InformationRetrieval CS276:InformationRetrievalandWebSearch PanduNayakandPrabhakarRaghavan