Advanced Query Parsing Techniques
-
Upload
search-technologies -
Category
Technology
-
view
336 -
download
1
description
Transcript of Advanced Query Parsing Techniques
![Page 1: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/1.jpg)
Advanced Query Parsing Techniques
Aruna Kumar Pamulapati (Arun)Technical Consultant
![Page 2: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/2.jpg)
2 The expert in the search space
Search Technologies Overview
Formed June 2005Over 100 employees and growingOver 500 customers worldwidePresence in US, Latin America, UK & GermanyDeep enterprise search expertiseConsistent revenue growth and profitabilitySearch Engine Independent
![Page 3: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/3.jpg)
3 The expert in the search space
Lucene Relevancy: Simple Operators
term(A) TF(A) * IDF(A)Implemented with DefaultSimilarity / TermQueryTF(A) = sqrt(termInDocCount)IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0
and(A,B) A * BImplemented with BooleanQuery()
or(A, B) A + BImplemented with BooleanQuery()
max(A, B) max(A, B)Implemented with DisjunctionMaxQuery()
![Page 4: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/4.jpg)
4 The expert in the search space
Simple Operators - Example
and
or max
george martha washington custis
0.10 0.20 0.60 0.90
0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90
0.3 * 0.9 = 0.27
![Page 5: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/5.jpg)
5 The expert in the search space
Less Used Operators
boost(f, A) (A * f)Implemented with Query.setBoost(f)
constant(f, A) if(A) then f else 0.0Implemented with ConstantScoreQuery()
boostPlus(A, B) if(A) then (A + B) else 0.0Implemented with BooleanQuery()
boostMul(f, A, B) if(B) then (A * f) else AImplemented with BoostingQuery()
![Page 6: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/6.jpg)
6 The expert in the search space
Problem: Need for More Flexibility
Difficult / impossible to use all operatorsMany not available in standard query parsers
Complex expressions = string manipulationThis is messy
Query construction is in the application layerYour UI programmer is creating query expressions?Seriously?
Hard to create and use new operatorsRequires modifying query parsers - yuck
![Page 7: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/7.jpg)
7 The expert in the search space
Query Processing Language
Solr
UserInterface
QPLEngine Search
QPLScript
![Page 8: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/8.jpg)
8 The expert in the search space
Introducing: QPL
Query Processing LanguageDomain Specific Language for Constructing QueriesBuilt on Groovyhttps://wiki.searchtechnologies.com/index.php/QPL_Home_Page
Solr Plug-InsQuery ParserSearch Component
“The 4GL for Text Search Query Expressions”Server-side Solr Access
Cores, Analyzers, Embedded Search, Results XML
![Page 9: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/9.jpg)
9 The expert in the search space
Solr Plug-Ins
![Page 10: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/10.jpg)
10 The expert in the search space
QPL Configuration – solrconfig.xml
<queryParser name="qpl" class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"> <str name="scriptFile">parser.qpl</str> <str name="defaultField">text</str></queryParser>
<searchComponent name="qplSearchFirst" class="com.searchtechnologies.qpl.solr.QPLSearchComponent"> <str name="scriptFile">search.qpl</str> <str name="defaultField">text</str> <str name="isProcessScript">false</str></searchComponent>
Query Parser Configuration:
Search Component Configuration:
![Page 11: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/11.jpg)
11 The expert in the search space
QPL Example #1
myTerms = solr.tokenize(query);
phraseQ = phrase(myTerms);
andQ = and(myTerms);
return phraseQ^3.0 | andQ^2.0 | orQ;
Tokenize:
Phrase Query:
And Query:
Put It All Together:
orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms);
Or Query:
![Page 12: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/12.jpg)
12 The expert in the search space
Thesaurus Example #2
myTerms = solr.tokenize(query);
thes = Thesaurus.load("thesaurus.xml")
thesQ = thes.expand(0.8f, solr.tokenizer("text"), myTerms);
return and(thesQ);
Tokenize:
Load Thesaurus: (cached)
Thesaurus Expansion:
Put It All Together:Original Query: bathroom humor
[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
![Page 13: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/13.jpg)
13 The expert in the search space
More Operators
Boolean Query Parser:pQ = parseQuery("(george or martha) near/5 washington")
Relevancy Ranking Operators:q1 = boostPlus(query, optionalQ)q2 = boostMul(0.5, query, optionalQ)q3 = constant(0.5, query)
Composite Queries:compQ = and(compositeMax(
["title":1.5, "body":0.8],"george", "washington"))
![Page 14: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/14.jpg)
14 The expert in the search space
News Feed Use Case
Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older
![Page 15: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/15.jpg)
15 The expert in the search space
News Feed Use Case – Step 1
markets = split(solr.markets, "\\s*;\\s*")marketsQ = field("markets", or(markets));
terms = solr.tokenize(query);termsQ = field("body", or(thesaurus.expand(0.9f, terms)))
compIds = split(solr.compIds, "\\s*;\\s*")compIdsQ = field("companyIds", or(compIds))
Segments:
Terms:
Companies:
![Page 16: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/16.jpg)
16 The expert in the search space
News Feed Use Case – Step 2
todayDate = sdf.format(c.getTime())todayQ = field("date_s",todayDate)
c.add(Calendar.DAY_OF_MONTH, -1)yesterdayDate = sdf.format(c.getTime())yesterdayQ = field("date_s",yesterdayDate)
Today:
Yesterday:
sdf = new SimpleDateFormat("yyyy-MM-dd")cal = Calendar.getInstance()
![Page 17: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/17.jpg)
17 The expert in the search space
News Feed Use Case – Step 3
sq1 = constant(4.0, and(marketsQ, termsQ))sq2 = constant(3.0, marketsQ)sq3 = constant(2.0, termsQ)sq4 = constant(1.0, compIdsQ)subjectQ = max(sq1, sq2, sq3, sq4)
tq1 = constant(10.0, todayQ)tq2 = constant(1.0, yesterdayQ)timeQ = max(tq1, tq2)
recentQ = and(subjectQ, timeQ)
Weighted Subject Queries:
Weighted Time Queries:
Put it All Together:
return max(recentQ, or(marketsQ,compIdsQ)^0.01))
![Page 18: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/18.jpg)
18 The expert in the search space
BT RLP Tokenizer Use Case – Step 1
<tokenizer class="com.basistech.rlp.solr.RLPTokenizerFactory" rlpContext=“<PATH>rlp-context-bl1.xml" postAltLemmas="false"
lang="eng" postPartOfSpeech="false"/>
Define field type:
finalExpandedQuery = transform(queryTerms,[ TERM:{ ctx -> def btCustomTokens = solr.tokenize("subject_bt", ctx.op.term)
if(btCustomTokens.size()> 1) return or( term(btCustomTokens[0])^1.5, or(btCustomTokens[1..-1])); else
return ctx.op;} ]);
QPL Expansion:
![Page 19: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/19.jpg)
19 The expert in the search space
BT RLP Tokenizer Use Case – Step 2
Original User Query: following is "presentation on QPL"
QPL Parsed: and(and(term(following),term(is)), phrase(term(presentation),term(on),term(QPL)))
BT Expansion + QPL Transformation :and(and(or(term(following)^1.5,term(follow)),or(term(is)^1.5,term(be))),phrase(term(presentation),term(on),term(QPL)))
![Page 20: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/20.jpg)
20 The expert in the search space
BT RLP Tokenizer Use Case – Step 3
and
and phrase
Presentation on QPLFollowing is
or
follow
or
be
^1.5 ^1.5
![Page 21: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/21.jpg)
21 The expert in the search space
Embedded Search Example #1
results = solr.search('subjectsCore', or(qTerms), 50)
subjectsQ = or(results*.subjectId)
return field("title", and(qTerms)) | subjectsQ^0.9;
Execute an Embedded Search:
Create a query from the results:
Put it all together:
qTerms = solr.tokenize(qTerms);
![Page 22: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/22.jpg)
22 The expert in the search space
Embedded Search Example #2
results = solr.search('categories', and(qTerms), 10)
myList = solr.newList();myList.add("relatedCategories", results*.title);
solr.addResponse(myList)
Execute an Embedded Search:
Create a Solr named list:
Add it to the XML response:
qTerms = solr.tokenize(qTerms);
![Page 23: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/23.jpg)
23 The expert in the search space
Other Features
Embedded Grouping QueriesOh yes they did!
Proximity operatorsADJ, NEAR/#, BEFORE/#
Reverse LemmatizerPrefers exact matches over variants
TransformerApplies transformations recursively to query trees
![Page 24: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/24.jpg)
24 The expert in the search space
Query Processing Language
Solr
UserInterface
QPLEngine Search
Data as entered by user Boolean
Query ExpressionQPL
Script
ApplicationDev Team
Search Team
![Page 25: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/25.jpg)
25 The expert in the search space
Query Processing Language
Solr
UserInterface
QPLEngine Search
QPLScript
RDBMS OtherIndexes Thesaurus
![Page 26: Advanced Query Parsing Techniques](https://reader033.fdocuments.us/reader033/viewer/2022061212/54958cacac7959132e8b4e95/html5/thumbnails/26.jpg)
26 The expert in the search space
More on QPL…
http://www.searchtechnologies.com/query-
parsing-language.html