Understanding and visualizing solr explain information - Rafal Kuc
-
Upload
lucenerevolution -
Category
Technology
-
view
110 -
download
2
description
Transcript of Understanding and visualizing solr explain information - Rafal Kuc
Understanding and visualisingSolr explain information
Rafał Kuć, Marek Rogoziński, [email protected], [email protected], 18.10.2011
My Background
� Rafał Kuć• Working with Lucene since 2002• Working with Solr since 2007
� Solr.pl• Co – founder (with Marek Rogozi ńńńński)
� Area of expertise• Lucene and Solr consultant and architect in
many major e-commerce sites in Poland• Author of „Solr 3.1 cookbook” by Packt
Publishing• Father, husband, Starcraft II player and a
gardener after hours ☺
3
What I Will Cover
� Understanding and visualising Solr explaininformation
� How to make the information given by Apache Solr explain easily readable by a Solr user (not much technical one)
� Context• Complicated explain made simple• Explain other made even simpler
� What’s next to come
4
A typical use case
The Challenge
� Common questions like:• Why this document was found ?• Why this document wasn’t found ?• Why this document is higher than the other one ?• Why the results list look like this ?
� Considerations• Do we always have to anwser those questions ?
� So how to make users get the answers they want ?• That’s how http://explain.solr.pl was born
6
Let’s look at a typical example
� You run a query• q=ddr&defType=dismax&qf=name^1000+description^100&bf
=pow(price,1.5)&debugQuery=true&indent=true
� And you see the explain information
7
1.6771803 = (MATCH) sum of: 0.64883727 = (MATCH) max of:
0.64883727 = (MATCH) weight(name:ddr^1000.0 in 6), product of:0.99999994 = queryWeight(name:ddr^1000.0), product of:
1000.0 = boost2.446919 = idf(docFreq=3, maxDocs=17) 4.0867718E-4 = queryNorm
0.6488373 = (MATCH) fieldWeight(name:ddr in 6), product of: 1.4142135 = tf(termFreq(name:ddr)=2) 2.446919 = idf(docFreq=3, maxDocs=17) 0.1875 = fieldNorm(field=name, doc=6)
1.028343 = (MATCH) FunctionQuery(pow(float(price),const(1.5))), product of: 2516.272 = pow(float(price)=185.0,const(1.5)) 1.0 = boost4.0867718E-4 = queryNorm
Some theory
� tf – term’s frequency
� df – document frequency� idf – inverse document frequency
� norm – normalization factor• queryNorm – query normalization factor• fieldNorm – field normalization factor
� coord – score factor
8
Let’s take a look at it again1.6771803 = (MATCH) sum of:
0.64883727 = (MATCH) max of:
0.64883727 = (MATCH) weight(name:ddr^1000.0 in 6), product of:
0.99999994 = queryWeight(name:ddr^1000.0), product of:
1000.0 = boost
2.446919 = idf(docFreq=3, maxDocs=17)
4.0867718E-4 = queryNorm
0.6488373 = (MATCH) fieldWeight(name:ddr in 6), product of:
1.4142135 = tf(termFreq(name:ddr)=2)
2.446919 = idf(docFreq=3, maxDocs=17)
0.1875 = fieldNorm(field=name, doc=6)
1.028343 = (MATCH) FunctionQuery(pow(float(price),const(1.5))), product of:
2516.272 = pow(float(price)=185.0,const(1.5))
1.0 = boost
4.0867718E-4 = queryNorm
A little more complicated example36.50278 = (MATCH) sum of:
1.54896 = (MATCH) sum of:0.46676102 = (MATCH) max of:0.46676102 = (MATCH) weight(name:hard^20.0 in 2), product of:
0.5461986 = queryWeight(name:hard^20.0), product of:20.0 = boost2.734601 = idf(docFreq=2, maxDocs=17)0.009986806 = queryNorm
0.8545628 = (MATCH) fieldWeight(name:hard in 2), product of:1.0 = tf(termFreq(name:hard)=1)2.734601 = idf(docFreq=2, maxDocs=17)0.3125 = fieldNorm(field=name, doc=2)
0.46676102 = (MATCH) max of:0.46676102 = (MATCH) weight(name:drive^20.0 in 2), product of:
0.5461986 = queryWeight(name:drive^20.0), product of:20.0 = boost2.734601 = idf(docFreq=2, maxDocs=17)0.009986806 = queryNorm
0.8545628 = (MATCH) fieldWeight(name:drive in 2), product of:1.0 = tf(termFreq(name:drive)=1)2.734601 = idf(docFreq=2, maxDocs=17)0.3125 = fieldNorm(field=name, doc=2)
0.61543787 = (MATCH) max of:
0.098470055 = (MATCH) weight(manu:maxtor in 2), product of:0.03135923 = queryWeight(manu:maxtor), product of:3.1400661 = idf(docFreq=1, maxDocs=17)0.009986806 = queryNorm
3.1400661 = (MATCH) fieldWeight(manu:maxtor in 2), product of:1.0 = tf(termFreq(manu:maxtor)=1)3.1400661 = idf(docFreq=1, maxDocs=17)1.0 = fieldNorm(field=manu, doc=2)
0.61543787 = (MATCH) weight(name:maxtor^20.0 in 2), product of:0.6271846 = queryWeight(name:maxtor^20.0), product of:20.0 = boost3.1400661 = idf(docFreq=1, maxDocs=17)0.009986806 = queryNorm
0.9812707 = (MATCH) fieldWeight(name:maxtor in 2), product of:1.0 = tf(termFreq(name:maxtor)=1)3.1400661 = idf(docFreq=1, maxDocs=17)0.3125 = fieldNorm(field=name, doc=2)
34.95382 = (MATCH) FunctionQuery(float(price)), product of:350.0 = float(price)=350.010.0 = boost0.009986806 = queryNorm
And now , a real life example1.6287426 = (MATCH) sum of:
0.8143703 = (MATCH) sum of:0.40718514 = (MATCH) max plus 0.01 times others of:4.154771E-7 = (MATCH) weight(description_nostemm:harry^10.0 in 36647), product of:4.4066886E-7 = queryWeight(description_nostemm:harry^10.0), product of:10.0 = boost7.5426636 = idf(docFreq=796, maxDocs=553224)5.8423506E-9 = queryNorm
0.94283295 = (MATCH) fieldWeight(description_nostemm:harry in 36647), product of:1.0 = tf(termFreq(description_nostemm:harry)=1)7.5426636 = idf(docFreq=796, maxDocs=553224)0.125 = fieldNorm(field=description_nostemm, doc=36647)
0.40718514 = (MATCH) weight(category_search:harri^2000000.0 in 36647), product of:0.123389944 = queryWeight(category_search:harri^2000000.0), product of:2000000.0 = boost10.559957 = idf(docFreq=38, maxDocs=553224)5.8423506E-9 = queryNorm
3.2999864 = (MATCH) fieldWeight(category_search:harri in 36647), product of:1.0 = tf(termFreq(category_search:harri)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_search, doc=36647)
5.976383E-8 = (MATCH) weight(description:harri in 36647), product of:4.2931266E-8 = queryWeight(description:harri), product of:7.348286 = idf(docFreq=967, maxDocs=553224)5.8423506E-9 = queryNorm
1.3920817 = (MATCH) fieldWeight(description:harri in 36647), product of:1.7320508 = tf(termFreq(description:harri)=3)7.348286 = idf(docFreq=967, maxDocs=553224)0.109375 = fieldNorm(field=description, doc=36647)
0.40718514 = (MATCH) max plus 0.01 times others of:5.0300997E-7 = (MATCH) weight(description_nostemm:potter^10.0 in 36647), product of:4.84872E-7 = queryWeight(description_nostemm:potter^10.0), product of:10.0 = boost8.299262 = idf(docFreq=373, maxDocs=553224)5.8423506E-9 = queryNorm
1.0374078 = (MATCH) fieldWeight(description_nostemm:potter in 36647), product of:1.0 = tf(termFreq(description_nostemm:potter)=1)8.299262 = idf(docFreq=373, maxDocs=553224)0.125 = fieldNorm(field=description_nostemm, doc=36647)
0.40718514 = (MATCH) weight(category_search:Potter^2000000.0 in 36647), product of:0.123389944 = queryWeight(category_search:Potter^2000000.0), product of:2000000.0 = boost10.559957 = idf(docFreq=38, maxDocs=553224)5.8423506E-9 = queryNorm
3.2999864 = (MATCH) fieldWeight(category_search:Potter in 36647), product of:1.0 = tf(termFreq(category_search:Potter)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_search, doc=36647)
5.7398886E-8 = (MATCH) weight(description:Potter in 36647), product of:4.656172E-8 = queryWeight(description:Potter), product of:7.9696894 = idf(docFreq=519, maxDocs=553224)5.8423506E-9 = queryNorm
1.2327484 = (MATCH) fieldWeight(description:Potter in 36647), product of:1.4142135 = tf(termFreq(description:Potter)=2)7.9696894 = idf(docFreq=519, maxDocs=553224)0.109375 = fieldNorm(field=description, doc=36647)
1.8327936E-6 = (MATCH) max plus 0.01 times others of:1.8327936E-6 = (MATCH) weight(description_nostemm:"harry potter"~100^10.0 in 36647), product of:9.255408E-7 = queryWeight(description_nostemm:"harry potter"~100^10.0), product of:10.0 = boost15.841926 = idf(description_nostemm: harry=796 potter=373)5.8423506E-9 = queryNorm
1.9802407 = fieldWeight(description_nostemm:"harry potter" in 36647), product of:1.0 = tf(phraseFreq=1.0)15.841926 = idf(description_nostemm: harry=796 potter=373)0.125 = fieldNorm(field=description_nostemm, doc=36647)
0.81437016 = (MATCH) sum of:0.40718508 = (MATCH) weight(category_the:harri in 36647), product of:0.12338993 = queryWeight(category_the:harri), product of:10.559957 = idf(docFreq=38, maxDocs=553224)0.011684701 = queryNorm
3.2999864 = (MATCH) fieldWeight(category_the:harri in 36647), product of:1.0 = tf(termFreq(category_the:harri)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_the, doc=36647)
0.40718508 = (MATCH) weight(category_the:Potter in 36647), product of:0.12338993 = queryWeight(category_the:Potter), product of:10.559957 = idf(docFreq=38, maxDocs=553224)0.011684701 = queryNorm
3.2999864 = (MATCH) fieldWeight(category_the:Potter in 36647), product of:1.0 = tf(termFreq(category_the:Potter)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_the, doc=36647)
3.394099E-7 = (MATCH) FunctionQuery(pow(int(sold),const(1.5))), product of:58.09475 = pow(int(sold)=15,const(1.5))1.0 = boost5.8423506E-9 = queryNorm
Let’s visualize now
History view
Basic information
The real thing
Even more ☺
What if we can ’t match ?
And the no-matched explain
What you gain from explain.solr.pl
� View Solr explain information in a humanreadable form
� Easily recognize the most influencing elementsof the scoring process
� Answer the questions faster� More things to come in the future
19
Plans for the future
� Support for more formats of Apache Solrexplain (right now, only Solr 3.x is supported)
� Visualisation of additional data� More functionalities like:
• query problems analysis• query syntax analysis and explanation• query time analysis and visualization• result comparison between cores or instances
� Very distant future - additional web applicationdeployed along Solr to enable real timeanalysis of boosts influence
Wrap Up
� The http://explain.solr.pl should be availablevery soon (probably end of October or midNovember)
� Code of explain.solr.pl will be available on GitHub soon after the initial release
� There will be a Java version of thehttp://explain.solr.pl which will cover much moreinformation
21
Sources
� Links• http://www.solr.pl• http://explain.solr.pl• http://lucene.apache.org ☺
� We would like to thank:• ŁŁŁŁukasz Lewandowski ( http://llewandowski.pl/ ) for
his work on the GUI • Hubert ‘depesz’ Lubaczewski ( http://depesz.com )
for idea ☺
22
Contact
� Rafał Kuć• [email protected]• http://solr.pl
� Marek Rogoziński• [email protected]• http://solr.pl
23
Thank you