Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

24
Lifecycle of a Solr Search Request Chris "Hoss" Hostetter - 2017-09-14 https://home.apache.org/~hossman/rev2017/ https://twitter.com/_hossman https://www.lucidworks.com/ Abstract: This intermediate session for existing Solr users will provide a Deep Dive look into the lifecycle of a Solr Search Request. We will drill down through each layer of code, discussing what happens at each stage -- including when & how inter-node communication takes place in a multi-node SolrCloud cluster. Along the way, we will also review the various places where users can configure existing (or custom written) plugins to override or amend the default behavior. Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 1 of 24 10/4/17, 4:32 PM

Transcript of Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Page 1: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Lifecycle of aSolr Search

RequestChris "Hoss" Hostetter - 2017-09-14

https://home.apache.org/~hossman/rev2017/

https://twitter.com/_hossman

https://www.lucidworks.com/

Abstract:

This intermediate session for existing Solr users will provide aDeep Dive look into the lifecycle of a Solr Search Request. Wewill drill down through each layer of code, discussing whathappens at each stage -- including when & how inter-nodecommunication takes place in a multi-node SolrCloud cluster.Along the way, we will also review the various places whereusers can configure existing (or custom written) plugins tooverride or amend the default behavior.

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

1 of 24 10/4/17, 4:32 PM

Page 2: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Agenda

Deep Dive look into the lifecycle of 4 Solr Search Requests...

Single Node: Single SolrCoreSimple Query1. Facet Query2.

SolrCloud: 2 Shards + 2 ReplicasSimple Query3. Facet Query4.

...and where various types of Plugins can be used.

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

2 of 24 10/4/17, 4:32 PM

Page 3: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Simple QuerySingle Node: Single SolrCore

bin/solr -e techproducts

http://localhost:8983/solr/techproducts/select ? q = ipod & sort = inStock desc, score desc & fl = id, name & rows = 10

This sample paginated query is based off of the techproductsexample configs & data that have been included in ever release of Solrsince it was first open sourced.

I have a nostalgic affection for this silly little dataset.

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

3 of 24 10/4/17, 4:32 PM

Page 4: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

HTTP (Jetty)

SolrDispatchFilterSolr Webapp/solr ➔

CoreContainer

/techproducts ➔ SolrCore/select? ➔ RequestHandler

SolrCorefoo

SolrCoreetc...

wt=json ➔ ResponseWriter

...:8983/solr/techproducts/select?...

UI: H

TML,

Jav

ascr

ipt,

Imag

es, C

SS

SolrCoretechproducts

Purple: The HTTP layer, currently implemented by JettyBlue: Solr runs as "webapp" inside the Jetty Servlet container (butthat's just an implementation detail)Black: The key pieces of the Solr webapp: misc "flat files" that powerthe Solr UI, and the SolrDispatchFilter which is responsiblefor mapping all HTTP request/responses into their internal Solrrepresentations and executing themRed: CoreContainer is singleton responsible to managing thelifecycle of SolrCoresGreen: each SolrCore encapsulates the configs & data for a single"index" (which in a SolrCloud configuration would be a replica ofsome shard or some collection)

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

4 of 24 10/4/17, 4:32 PM

Page 5: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

SolrCore: techproductsSolrRequestHandlers SearchComponents

QueryComponent: query - prepare() - df=text&q=ipod ➔ Query - etc... - process() - etc...

SearchHandler: /select - initParams - df = text (default) - components (implicit) - query - etc...

SearchHandler: /etc...

UpdateRequestHandler : /etc...

FacetComponent: facet

etc...

Green: The SolrCore used for this (HTTP) requestBlack: Named instances of (plugable) SolrRequestHandlers.SearchHandler is the most common, and it uses a configurablelist of SearchComponentsRed: Named instances of (plugable) SearchComponents,QueryComponent is the only one used in this simple requestAll SearchComponents implement prepare() & process()methods, which are called by SearchHandler

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

5 of 24 10/4/17, 4:32 PM

Page 6: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

SolrIndexSearcher

query

IndexSchema - SchemaFields ➔ FieldTypes

QueryComponent.prepare() + rows=10 ➔ ok?

fl=id,name ➔ ok? / q ➔ LuceneQParser

LuceneQParser + (df=text ➔ text) + "ipod" ➔ TermQuery( "inStock desc" ➔ bool ➔ BoolField.getSortField(inStock,desc) + "score desc" ➔ SortField.SCORE ) ➔ Sort

TextField: text - Analyzer - Similarity - etc...

TextField: etc.. - Analyzer - Similarity - etc...

BoolField: bool - Analyzer - Similarity - getSortField - etc...

LuceneQParser

DismaxQParser

etc...

Red: QueryComponent.prepare() and it's basic logic forvalidating & parsing the basic request paramsGreen: Named instances of (pluggable) QParserPlugins forparsing query strings (q & fq params). Here the (implicit) defaultLuceneQParserOrange: The IndexSchema which contains...

Named SchemaFields (or dynamicFields) which mapto...Purple: Named instances of (pluggable) FieldTypes whichdictate how the field names mapped to them are parsed,indexed, sorted, queried, etc...

Blue: The SolrIndexSearcher is ultimately what will bequeried with these parsed queries & sort objects

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

6 of 24 10/4/17, 4:32 PM

Page 7: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

SolrIndexSearcher.search(...)window(start, rows, windowSize)(queryResultCache? | Index) ➔ DocList

queryQueryComponent.process()search(Query,filters[],start,rows,Sort,...) ➔ DocList

JsonResponseWriter

DocList { + searcher.doc(#) ➔ Stored Fields}➔ Bytes ➔ HTTP...

documentCache

queryResultCache

filterCache

IndexReader - InvertedIndex - Stored Fields XmlResponseWriter

etc...

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

7 of 24 10/4/17, 4:32 PM

Page 8: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Red: QueryComponent.process() which uses theSolrIndexSearcher to execute the Query created by it'sprepare() methodBlue: the SolrIndexSearcher includes several caches inaddition to the InvertedIndex, and when executing a query, firstevaluates the start/rows requested to fit a configured "window size"so that "page #2" type requests can result in a cache hit & re-use theresults computed for "page #1"

Orange: The low level InvertedIndex & ThequeryResultCache that can be used in it's place whenexecuting basic searchers & the DocList containing a sortedlist of (internal) doc#s and their scores for the requestedstart+rows of this queryPurple: The Stored Fields of the documents in the index & thedocumentCache used by SolrIndexSearcher toreduce disk reads when popular documents are frequentlymatched by searches

Green: Named instances of (pluggable)QueryResponseWriters which dictate how the data structuresproduced once a request is processed get serialized into bytes (forthe HTTP response returned to the original client by Jetty)

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

8 of 24 10/4/17, 4:32 PM

Page 9: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

More Complex QuerySingle Node: Single SolrCore

http://localhost:8983/solr/techproducts/select ? q = ipod & fq = price:[* TO 1000] & sort = div(popularity,price) asc, score desc & fl = id, name, why:[explain style=nl] & facet = true & facet.field = cat

This slightly more interesting query builds off the previous example by:

Adding a "filter query" on the (numeric) price fieldChanging the primary sort criteria to be a mathematical functionagainst 2 fieldsRequesting an additional psuedo-field explaining the score of eachdocumentFaceting on the "cat" (aka: category) field

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

9 of 24 10/4/17, 4:32 PM

Page 10: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

HTTP (Jetty)

SolrDispatchFilterSolr Webapp/solr ➔

CoreContainer

/techproducts ➔ SolrCore/select? ➔ RequestHandler

SolrCorefoo

SolrCoreetc...

wt=json ➔ ResponseWriter

...:8983/solr/techproducts/select?...

UI: H

TML,

Jav

ascr

ipt,

Imag

es, C

SS

SolrCoretechproducts

The HTTP, Webapp, DispatchFilter, CoreContainer, SolrCore, andRequestHandler layers all function exactly as in our previous (simpler)example. It's only once the SearchHandler starts looping over thecomponents that things get more interesting....

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

10 of 24 10/4/17, 4:32 PM

Page 11: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

query

IndexSchema - SchemaFields ➔ FieldTypes

QueryComponent.prepare()etc..."price:[* TO 1000]" ➔ float ➔ PointRangeQuery(...) ➔ filters[]div(popularity,price) ➔ ValueSource(IntFieldSource,...)

FloatPointField: float - ValueSource - getRangeQuery() - etc...

IntPointField: int - ValueSource - etc...

FacetComponent.prepare()facet=true ✔facet.field=cat ➔ ok?needDocSet = true

SolrIndexSearcher

div()

sum()

etc...

Most items identical to those shown in the "simple" query are omitted forbrevity. Of the new items shown here...

Red: In addition to some additional logic inQueryComponent.prepare() method (to parse the filterquery and more complex sort) we know also see theFacetComponent.prepare() method, which does it's ownvalidation & sets a flag indicating that it needs extra info (theDocSet) once SolrIndexSearcher is asked to execute theQueryGreen: Named instances of (pluggable) ValueSourceParsersfor parsing function strings -- used here in our sort, but could also beused in queriesOrange: As before the IndexSchema, now showing thatFieldTypes are also responsible for providing the range query(filter) and ValueSources (used by the functions)

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

11 of 24 10/4/17, 4:32 PM

Page 12: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

SolrIndexSearcher

queryQueryComponent.process()search(...) ➔〈DocList,DocSet〉etc...

JsonResponseWriter

DocList { + searcher.doc(#) ➔ Stored Fields + [explain ...]}

+ Facet Counts

➔ Bytes ➔ HTTP...

ExplainAugmenter

ChildDocTransformer

queryFacetComponent.process()For Each "cat" Index Terms: ➔ Intersect with DocSet

SubQueryAugmenter

etc...

searcher.explain(#)

documentCache

queryResultCache

filterCache

IndexReader - InvertedIndex - Stored Fields

Most items identical to those shown in the "simple" query are omitted forbrevity. Of the new items shown here...

Red: Now when QueryComponent.process() executes thesearch, the "needsDocSet" flag set byFacetComponent.prepare() is also used.FacetComponent.process() can then use the resultingDocSet (an unordered set of all matching doc# -- regardless of sort)to compute the facet counts.Olive: Named instances of (pluggable) DocTransformers (orAugmenters) which can be used to annotate individual documentsreturned in the results. For this query in particular we see theExplainAugmenter which uses the SolrIndexSearcher toget a (debugging) data structure "explaining" how the score of eachdocument was computed.Green: the JsonResponseWriter not only returns the StoredFields of each document, but also the results of anyDocTransformers. It also serializes the Facet Counts.

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

12 of 24 10/4/17, 4:32 PM

Page 13: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Simple QuerySolrCloud: 4 Nodes, 2 Shards, 2 Replicas

bin/solr -e cloud...

http://localhost:8983/solr/techproducts/select ? q = ipod & sort = inStock desc, score desc & fl = id, name & rows = 10

This is the same as or original simple query, still using thetechproducts sample configs & data, but from here on we'll assumewe're using a 4 node SolrCloud cluster, with the techproductscollection configured to have 2 shards, with a replication factor of 2.

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

13 of 24 10/4/17, 4:32 PM

Page 14: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

SolrDispatchFilter/techproducts ➔ tech_s1_r2

Jetty: http://host1:8983

SolrDispatchFilter/techproducts ?➔ host4

Jetty: http://host3:8983

SolrDispatchFilter/techproducts ?➔ tech_s2_r2

Jetty: http://host2:8983

SolrDispatchFilter/techproducts ➔ tech_s2_r1

Jetty: http://host4:8983

techproductstech_s1_r2

foofoo_s1_r1

foofoo_s2_r1

techproductstech_s1_r1

techproductstech_s2_r1

foofoo_s1_r2

techproductstech_s2_r2

foofoo_s2_r2

Purple: 4 Jetty instances, running on (the same port 8983 of) 4different hostsBlack: The 4 SolrDispatchFilters running inside each ofthese 4 Jetty instances, and how each of them resolves requests forthe techproducts collection.Green the individual SolrCores (which are each a replica of someshard of a collection) running in each Solr node. Note that for thepurposes of illustrating the diff possible ways a Solr request may berouted, host3 does not contain any SolrCores that are part of thetechproducts collection.

(Other Layers such as the Solr webapp and the CoreContainer havebeen omitted to save space)

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

14 of 24 10/4/17, 4:32 PM

Page 15: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

coordinator shard1QueryComponent:prepare() + process() α: q=ipod&fl=id&fsv=true ➔ top ids + sort values β1: ids=X,Y,Z&fl=name ➔ ...

shard2QueryComponent:prepare() + process() α: q=ipod&fl=id&fsv=true ➔ top ids + sort values β2: ids=A,..,G&fl=name ➔ ...

SearchHandler: /selectRepeat until done: query.distributedProcess ➔ ShardRequests (α,β) Loop: ShardRequests query.handleResponse

QueryComponent: distributedProcess() α: shard top10 + sort values β: full fl for final top10 ids

FacetComponent

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

15 of 24 10/4/17, 4:32 PM

Page 16: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Purple: The HTTP Layer showing 3 hosts: an arbitrary 'coordinator'node, and 2 nodes each hosting a replica of the 2 shards for thecollectionBlack: SearchHandler. On the coordinator node,SearchHandler executes new logic to execute sub-requestscreated by it's SearchComponents to arbitrarily selected replicasof each shard. On the replicas handling these sub-requests, theSearchHandler processes these requests just as if they weresimple (single node) queries.Red: SearchComponent methods. On the coordinator nodeSearchHandler loops over every component callingSearchComponent.distributedProcess() tocreate/modify sub-requests for the individual shards, and then callsSearchComponent.handleResponse() to merge theresults from each shard and decide if/when/what additionalinformation may be needed. This process repeats until all calls todistributedProcess() on all SearchComponentsindicate that they are finished.Green & Blue: The 2 stages (α & β) of shard sub-requests needed toprocess this simple query. Note that the α-requests are identical forboth shards, but the β-requests are slightly different to request thefl fields for the matches specific to that shard.

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

16 of 24 10/4/17, 4:32 PM

Page 17: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Shard Request αq=ipod&fl=id&fsv=true&rows=10sort=inStock desc, score desc numFound=42+314=356

Z, ZebraF, FrogB, BoatD, DeerC, Car

X, X-RayG, GongA, AppleY, Yo-YoE, Ear

Merged

Shard 1numFound=42

F〈true,6〉B〈true,6〉D〈true,5〉C〈true,3〉G〈true,2〉A〈true,1〉E〈false,5〉

Shard 2numFound=314

Z〈true,6〉X〈true,3〉Y〈false,9〉 Shard Request β

q=ipod&ids=...&fl=name

Shard 1A, AppleB, BoatC, CarD, DeerE, EarF, Frog

G, Gong

Shard 2X, X-RayY, Yo-YoZ, Zebra

Here we see hypothetical α request+responses, hypothetical βrequests+responses, & the final Merged results from both -- showing howthe IDs and sort values from the α request are used to determine whichdocuments will be in the final results, and in which order. For these specificdocuments, the β requests+responses fill in the fl fields for the finalclient.

Red & Blue: The responses from shard1 & shard2 for the α requestGreen & Purple: The responses from shard1 & shard2 for the βrequest

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

17 of 24 10/4/17, 4:32 PM

Page 18: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Complex Query*

SolrCloud: 4 Node, 2 Shards, 2 Replicas

http://localhost:8983/solr/techproducts/select ? q = ipod & sort = inStock desc, score desc & fl = id, name & facet = true & facet.field = cat

In the interest of time, this query is not as "Complex" as the "Complex"Single Core query we looked at before. I've omitted things like fq params,sorting on functions, and the use of DocTransformers in the flbecause nothing about how those are handled in a Single Core querychanges when they are requested by a coordinator node in a SolrCloudquery.

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

18 of 24 10/4/17, 4:32 PM

Page 19: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

coordinator shard1QueryComponent:prepare() + process() α: q=ipod&fl=id&fsv=true ➔ top ids + sort values β1: ids=X,Y,Z&fl=name ➔...

FacetComponent:prepare() + process() α: facet.limit=N + extra ➔ top terms w/counts β1: ..._terms=aa,qq,... ➔...

QueryComponent: distributedProcess() α: shard top10 + sort values β: full fl for final top10 ids

shard2

FacetComponent: distributedProcess() α: facet.field=cat w/facet.limit overrequest β: request missing counts for final top terms

SearchHandler: /select ➔ ShardRequests (α, β)

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

19 of 24 10/4/17, 4:32 PM

Page 20: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Purple: The HTTP Layer showing 3 hosts: an arbitrary 'coordinator'node, and 2 nodes each hosting a replica of the 2 shards for thecollection. To save space, the (largely redundant) details of therequests to shard2 are not shown.Black: SearchHandler. To save space, the details (shown inprevious diagrams) regarding how SearchHandler processesrequests when acting as a coordinator have been omitted -- the keything to note is that even with the added complexity of theFacetComponent, there are still only 2 stages of sub-requests toeach shard (α & β)Red: SearchComponent methods:

QueryComponent behaves exactly as beforeNow that FacetComponent is in use, it can modify the sub-requests created by QueryComponent to "piggy back" onthem and request additional information from each shard.

Green & Blue: The 2 stages (α & β) of shard sub-requests needed toprocess this query. Although the details of the requests to shard2 areomitted for brevity, the α-requests are identical for both shards, and(as before) the β-requests are slightly different to request both thethe fl fields for the document matches specific to that shard, as wellas the facet counts for any "candidate" terms that were not includedin the α response from that shard.

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

20 of 24 10/4/17, 4:32 PM

Page 21: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Shard Request αfacet.field=cat

facet.limit=N+OVERREQUEST

Shard Request βfacet.field={!_terms=...}cat

auto: 253 (3 + 250)lawn: 190 (20 + 170)

...DVD: 102 (5 + 97)

Final (Merge α+β)Shard 1games: 40

...lawn: 20

books: 10DVD: 5

...beach: 4toys: 3

Shard 2auto: 250lawn: 170

...food: 100DVD: 97

...books: 90

clothing: 90

Shard 1auto: 3food: 0

Shard 2games: 45

N

auto: 250-253 (? + 250)lawn: 190 (20 + 170)

...games: 40-130 (40 + ?)food: 100-103 (? + 100)

DVD: 102 (5 + 97)...

Merge α

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

21 of 24 10/4/17, 4:32 PM

Page 22: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Here we see the additional information involved in α & βrequests+responses+merging for our more complex queries compared towhat we looked at before. The information requested & merged byQueryComponent is omitted for brevity, and we focus solely on howFacetComponent modifies those requests to "overrequest" theoriginal facet.limit and what it does with the results.

In the α request, over-request additional terms from each shard beyondwhat the user asked for; In the β request, ask each shard for the detailsabout any terms that are "candidates" for the final results but where NOTalready returned by this shard in the α response.

Each term that is a candidate for the final response is shown in a uniquecolor. Black/Grey is used to indicate terms where incomplete informationis available to the coordinator, but enough is known to be confident thatthey can't possibly be candidates for the final results. Faded terms (initalics) show at what stage the coordinating FacetComponent knowsthat particular term can be eliminated for consideration.

(While the "..." ellipses are used to denote the possibility of manyadditional terms depending on the value of facet.limit=N (whichdefaults to 100), viewers may find the easiest way to understand howthese results are merged & refined is to assume N=3 and imagine theellipses do not exist in the diagram)

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

22 of 24 10/4/17, 4:32 PM

Page 23: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Q & A

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

23 of 24 10/4/17, 4:32 PM

Page 24: Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Mehttps://twitter.com/_hossman

My Companyhttps://www.lucidworks.com/

These Slideshttps://home.apache.org/~hossman/rev2017/

Solr Docs & Mailing Listhttps://lucene.apache.org/solr/resources.html

Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/

24 of 24 10/4/17, 4:32 PM