Advanced query parsing techniques
-
Upload
lucenerevolution -
Category
Technology
-
view
1.176 -
download
1
Transcript of Advanced query parsing techniques
![Page 1: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/1.jpg)
Advanced Relevancy Ranking
Paul NelsonChief Architect / Search Technologies
![Page 2: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/2.jpg)
2Search Technologies Overview
• Formed June 2005• Over 100 employees and growing• Over 400 customers worldwide• Presence in US, Latin America, UK & Germany• Deep enterprise search expertise• Consistent revenue growth and profitability• Search Engine Independent
![Page 3: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/3.jpg)
3Lucene Relevancy: Simple Operators
• term(A) TF(A) * IDF(A)• Implemented with DefaultSimilarity / TermQuery• TF(A) = sqrt(termInDocCount)• IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0
• and(A,B) A * B• Implemented with BooleanQuery()
• or(A, B) A + B• Implemented with BooleanQuery()
• max(A, B) max(A, B)• Implemented with DisjunctionMaxQuery()
3
![Page 4: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/4.jpg)
4Simple Operators - Example
and
or max
george martha washington custis
0.10 0.20 0.60 0.90
0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90
0.3 * 0.9 = 0.27
![Page 5: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/5.jpg)
5Less Used Operators
• boost(f, A) (A * f)• Implemented with Query.setBoost(f)
• constant(f, A) if(A) then f else 0.0• Implemented with ConstantScoreQuery()
• boostPlus(A, B) if(A) then (A + B) else 0.0• Implemented with BooleanQuery()
• boostMul(f, A, B) if(B) then (A * f) else A• Implemented with BoostingQuery()
5
![Page 6: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/6.jpg)
6Problem: Need for More Flexibility
• Difficult / impossible to use all operators• Many not available in standard query parsers
• Complex expressions = string manipulation• This is messy
• Query construction is in the application layer• Your UI programmer is creating query expressions?• Seriously?
• Hard to create and use new operators• Requires modifying query parsers - yuck
6
![Page 7: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/7.jpg)
7
Solr
Query Processing Language 7
UserInterface
QPLEngine Search
QPLScript
![Page 8: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/8.jpg)
8Introducing: QPL
• Query Processing Language• Domain Specific Language for Constructing Queries• Built on Groovy• https://wiki.searchtechnologies.com/index.php/QPL_Home_Page
• Solr Plug-Ins• Query Parser• Search Component
• “The 4GL for Text Search Query Expressions”• Server-side Solr Access
• Cores, Analyzers, Embedded Search, Results XML
8
![Page 9: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/9.jpg)
9Solr Plug-Ins
![Page 10: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/10.jpg)
10QPL Configuration – solrconfig.xml
<queryParser name="qpl"class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin">
<str name="scriptFile">parser.qpl</str><str name="defaultField">text</str>
</queryParser>
<searchComponent name="qplSearchFirst"class="com.searchtechnologies.qpl.solr.QPLSearchComponent">
<str name="scriptFile">search.qpl</str><str name="defaultField">text</str><str name="isProcessScript">false</str>
</searchComponent>
Query Parser Configuration:
Search Component Configuration:
![Page 11: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/11.jpg)
11QPL Example #1
myTerms = solr.tokenize(query);
phraseQ = phrase(myTerms);
andQ = and(myTerms);
return phraseQ^3.0 | andQ^2.0 | orQ;
Tokenize:
Phrase Query:
And Query:
Put It All Together:
orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms);
Or Query:
![Page 12: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/12.jpg)
12Thesaurus Example #2
myTerms = solr.tokenize(query);
thes = Thesaurus.load("thesaurus.xml")
thesQ = thes.expand(0.8f,solr.tokenizer("text"), myTerms);
return and(thesQ);
Tokenize:
Load Thesaurus: (cached)
Thesaurus Expansion:
Put It All Together:Original Query: bathroom humor
[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
![Page 13: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/13.jpg)
13More Operators
Boolean Query Parser:pQ = parseQuery("(george or martha) near/5 washington")
Relevancy Ranking Operators:q1 = boostPlus(query, optionalQ)q2 = boostMul(0.5, query, optionalQ)q3 = constant(0.5, query)
Composite Queries:compQ = and(compositeMax(
["title":1.5, "body":0.8],"george", "washington"))
![Page 14: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/14.jpg)
14News Feed Use Case 14
Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older
![Page 15: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/15.jpg)
15News Feed Use Case – Step 1
markets = split(solr.markets, "\\s*;\\s*")marketsQ = field("markets", or(markets));
terms = solr.tokenize(query);termsQ = field("body",
or(thesaurus.expand(0.9f, terms)))
compIds = split(solr.compIds, "\\s*;\\s*")compIdsQ = field("companyIds", or(compIds))
Segments:
Terms:
Companies:
![Page 16: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/16.jpg)
16News Feed Use Case – Step 2
todayDate = sdf.format(c.getTime())todayQ = field("date_s",todayDate)
c.add(Calendar.DAY_OF_MONTH, -1)yesterdayDate = sdf.format(c.getTime())yesterdayQ = field("date_s",yesterdayDate)
Today:
Yesterday:
sdf = new SimpleDateFormat("yyyy-MM-dd")cal = Calendar.getInstance()
![Page 17: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/17.jpg)
17News Feed Use Case 17
Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older
![Page 18: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/18.jpg)
18News Feed Use Case – Step 3
sq1 = constant(4.0, and(marketsQ, termsQ))sq2 = constant(3.0, marketsQ)sq3 = constant(2.0, termsQ)sq4 = constant(1.0, compIdsQ)subjectQ = max(sq1, sq2, sq3, sq4)
tq1 = constant(10.0, todayQ)tq2 = constant(1.0, yesterdayQ)timeQ = max(tq1, tq2)
recentQ = and(subjectQ, timeQ)
Weighted Subject Queries:
Weighted Time Queries:
Put it All Together:
return max(recentQ, or(marketsQ,compIdsQ)^0.01))
![Page 19: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/19.jpg)
19Embedded Search Example #1
results = solr.search('subjectsCore', or(qTerms), 50)
subjectsQ = or(results*.subjectId)
return field("title", and(qTerms)) | subjectsQ^0.9;
Execute an Embedded Search:
Create a query from the results:
Put it all together:
qTerms = solr.tokenize(qTerms);
![Page 20: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/20.jpg)
20Embedded Search Example #2
results = solr.search('categories', and(qTerms), 10)
myList = solr.newList();myList.add("relatedCategories", results*.title);
solr.addResponse(myList)
Execute an Embedded Search:
Create a Solr named list:
Add it to the XML response:
qTerms = solr.tokenize(qTerms);
![Page 21: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/21.jpg)
21Other Features
• Embedded Grouping Queries• Oh yes they did!
• Proximity operators• ADJ, NEAR/#, BEFORE/#
• Reverse Lemmatizer• Prefers exact matches over variants
• Transformer• Applies transformations recursively to query trees
21
![Page 22: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/22.jpg)
22
Solr
Query Processing Language 22
UserInterface
QPLEngine Search
Data as entered by user Boolean
Query ExpressionQPL
Script
ApplicationDev Team
Search Team
![Page 23: Advanced query parsing techniques](https://reader034.fdocuments.us/reader034/viewer/2022052623/559deca61a28ab34148b4760/html5/thumbnails/23.jpg)
23
Solr
QPL: Using External Sources to Build Queries 23
UserInterface
QPLEngine Search
QPLScript
RDBMS OtherIndexes Thesaurus