Offline Testing Search Engine Results

17
Offline Testing Search Engine Results

Transcript of Offline Testing Search Engine Results

Page 1: Offline Testing Search Engine Results

OfflineTestingSearchEngineResults

Page 2: Offline Testing Search Engine Results

theexperiment

transitionfromFASTtoSolr

FASTSolr

myclientwastransitioning itssitesearchengine fromFASTtoSOLRandwantedthenewsearchengine toreturnmatchsearchresults.Thisisaunlikemostexperimentswhichinvolvesomeformofoptimization.

2

Page 3: Offline Testing Search Engine Results

testingmethodology

offlinetestingdoesn’thappeninisolationandisoftenaprecursortoABtesting

ABtestinglargescale

quantitativeusertesting

usertestingsmallscale

qualitativetesting

offlinetestingfocusondifferencesin

searchresults

performancetestingfocusonspeedandqueriespersecond

regressiontestingisanythingbroken?

3

Page 4: Offline Testing Search Engine Results

technologystack

FASTRealuserqueries results

SOLR

analysis

weran‘000sofrealuserqueriesagainsteachsearchengineandsavedtheresultsforanalysis

4

Page 5: Offline Testing Search Engine Results

offlinetestingmetrics

yourchoiceofmetricswilldependonyourgoalsandavailabilityofinformation

• differencesinresultscounts useeitherthemeanabsolutedifferenceinresults,alsoknownasmeanabsoluteerror(MAE),orrootmeansquarederror(RMSE)tomeasuredifferencesincounts.Itishelpfultoexpressthismetricinrelativeterms,asapercentageoftheaveragenumberofresultsreturnedbytheexistingsearchengine.

• howmanyresultsoverlap:usetheJaccard indexortheSørensen-Diceindextomeasuresimilarityacrosssetsofresults.

• rankcorrelation:useSpearman’srankcorrelationcoefficient(Spearman’srho)tomeasurecorrelationacrossoverlappingresults.

5

Page 6: Offline Testing Search Engine Results

offlinetestingmetrics

• precisionandrecall: precisionandrecallareoftenusedthemeasurethequalityofinformationretrievalsystems.Thesemetricsimplicitlyassumethatthecurrentsetofresultsisa“goldstandard”,andarebestsuitedwhenresultsaresortedbyrelevance.

• clickmetrics:ifhavehavearecordonuserinteractionswithyourexistingsetofsearchresults,youcouldforecastclick-metricsforthenewsearchengine.Commonmetricsincludetheaveragenumberofclicksperquery,aswellastheaverageormeanclickrank.

6

Page 7: Offline Testing Search Engine Results

analysisframework

#FASTresults

#Solr results

Solr =FAST

everyrepresentsaquery

Ourclientsfind thatascatterplotisahelpfulwaytovisualizedifferenceincounts.

Inanidealworld,everyquerywouldlieonastraight-line passingthrough theorigin (indicativeofaperfectmatchincountsbetweentheoldandthenewengine).

However,bugsanddifferencesinindexationcanforcepointsawayfromthatlineandontoeithertheXortheYaxis.

7

Page 8: Offline Testing Search Engine Results

differencesincountsAcrossqueryi

Fi

Si

Acrossallqueries

RMSE =Fi − Si( )2

i=1

n

∑n

TheRootMeanSquaredErrormeasurestheaveragedifferenceinthenumberofsearchresultsfound.

TheCoefficientofVariationexpressestheRMSEinrelativeterms:

CV =RMSE

Avg. #FAST results

FASTresultsABCDEFGHIJ

SOLRresultsBWCXDYZ

Fi isthenumberofFASTresultsforqueryiSi isthenumberofSOLRresultsforqueryi

8

Page 9: Offline Testing Search Engine Results

overlap

FASTresultsABCDEFGHIJ

SOLRresultsBWCXDYZ

Acrossqueryi Acrossallqueries

YoucouldusetheSørensen-Diceindextomeasurethesimilarityofsetsforeachquery.Itisboundedbetween0and1(1isdesirableand

indicativeofperfectoverlap).

Simi (Fi,Si ) =2 Fi Si∩Fi + Si

Fi isthesetofFASTresultsforqueryi atranknSi isthesetofSOLRresultsforqueryi atrankn

9

Page 10: Offline Testing Search Engine Results

rankcorrelation

FASTresultsABCDEFGHIJ

SOLRresultsBWCXDYZ

Acrossqueryi Acrossallqueries

YoucoulduseSpearman’s ranktocalculatecorrelationsacross

overlapping results.Thismetricisbounded between-1and1(1isdesirableandindicativeofperfect

positivecorrelation).

dj isthedifferenceinranks forthejthresultandiscalculatedas:FASTrankj - SOLRrankj

n isthenumberofoverlappingresultsforqueryi

ρi =1−6 Σdj

2

n (n2 −1)

10

Page 11: Offline Testing Search Engine Results

differencesincountsbefore

Quadrant QueryCount

QueryShare (%) RMSE CV(%)

FAST >0SOLR>0 7,049 82% 1,749,463 8,588%

FAST>0SOLR= 0 388 5% 1,718 403%

FAST =0SOLR>0 107 1% 7,078,404 NA

FAST =0SOLR=0 1,037 12% 0 NA

Overall 8,581 100% 1,771,711 10,575%

Querytype QueryCount

QueryShare (%) RMSE CV(%)

Wildcard queries 690 8% 99,894 298%

Loose phrasequeries 925 11% 15,905 188%

Plural-form queries 1,090 13% 13,845 207%

Automated insightsbyquery type. Notethataquerymaybeassociatedwithmorethanonetype.

Differencesincounts aredrivenbyahandfulofquerieswith0orfewresultsinFASTandmillionsinSOLR

11

Page 12: Offline Testing Search Engine Results

differencesincountsafter

Quadrant QueryCount

QueryShare (%) RMSE CV(%)

FAST >0SOLR>0 5,166 61% 213 2%

FAST>0SOLR= 0 125 1% 4,332 1,023%

FAST =0SOLR>0 20 0% 131 NA

FAST =0SOLR=0 3,169 37% 0 NA

Overall 8,480 100% 418 28%

Querytype QueryCount

QueryShare (%) RMSE CV(%)

Wildcard queries 665 8% 159 4%

Loose phrasequeries 910 11% 137 19%

Plural-form queries 2,356 28% 1,009 69%

Afterseveraliterations,differencesincountsaredownto2%onaverage acrossqueriesthatreturnresultsinFASTandSOLR.However,thereisstillmoreworktobedone.

12

Page 13: Offline Testing Search Engine Results

overlapbefore

ResultCounts *

ResultOverlap

JaccardIndex

Sorensen-Dice Index

5+ 1 0.21 0.26

10+ 3 0.23 0.30

20+ 7 0.26 0.34

*Measuredacrossthefirst5,10and20resultsforqueriesthatreturned5+,10+and20+resultsrespectively.

Onaverage,just7ofthefirst20results(and1ofthefirst5results)overlappedatthestartoftheprocess.

13

Page 14: Offline Testing Search Engine Results

overlapafter

ResultCounts *

ResultOverlap

JaccardIndex

Sorensen-DiceIndex

5+ 5 0.89 0.92

10+ 9 0.90 0.93

20+ 19 0.91 0.94

Afterseveraliterations,overlaphasrisento19resultsonpage1,up from7atthestartoftheprocess.Crucially,5ofthefirst5resultsoverlaponaverage.

*Measuredacrossthefirst5,10and20resultsforqueriesthatreturned5+,10+and20+resultsrespectively.

14

Page 15: Offline Testing Search Engine Results

rankcorrelationbefore

ResultCounts *

Spearman’sRank(datedesc.)

5+ 0.97

10+ 0.96

20+ 0.96

*Measuredacrossthefirst5,10and20resultsforqueriesthatreturned5+,10+and20+resultsrespectively.

Rankcorrelationwasparticularlystrongfromtheoutsetat0.96/1asalmostallsearchesaresortedbydateratherthanbyrelevance.

15

Page 16: Offline Testing Search Engine Results

rankcorrelationafter

ResultCounts *

Spearman’sRank(datedesc.)

5+ 0.98

10+ 0.98

20+ 0.99

*Measuredacrossthefirst5,10and20resultsforqueriesthatreturned5+,10+and20+resultsrespectively.

Afterseveraliterations,rankcorrelationacrossoverlapping resultshasimprovedfurtherto0.99/1.

16

Page 17: Offline Testing Search Engine Results

Thispresentationillustratedhowyoucouldofflinetestsearchengineresults.However,as

everyimplementationisunique,pleasecontactustodiscussyourneeds.