Post on 07-Jan-2017
Learning To Rank For Solr Michael Nilsson – Software Engineer
Diego Ceccarelli – Software Engineer
Joshua Pantony – Software Engineer Bloomberg LP
Copyright 2015 Bloomberg L.P. All rights reserved.
OUTLINE ● Search at Bloomberg
● Why do we need machine learning for search?
● Learning to Rank
● Solr Learning to Rank Plugin
8 millions searches PER DAY
1 million PER DAY
400 million stories in the index
SOLR IN BLOOMBERG ● Search engine of choice at Bloomberg
─ Large community / Well distributed committers
─ Open source Apache Project
─ Used within many commercial products
─ Large feature set and rapid growth
● Committed to open-source ─ Ability to contribute to core engine
─ Ability to fix bugs ourselves
─ Contributions in almost every Solr release since 4.5.0
PROBLEM SETUP
score: 30
score: 1.0
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=100∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+�10∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛
score: 52.2
score: 30.8
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=100∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+�10∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=𝟏𝟓𝟎∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+�𝟑.𝟏𝟒∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛+�𝟒𝟐∗𝑐𝑙𝑖𝑐𝑘𝑠
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=𝟗𝟗.𝟗∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+𝟑.𝟏𝟏𝟏𝟒∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛+𝟒𝟐.𝟒𝟐∗𝑐𝑙𝑖𝑐𝑘𝑠 + 5 ∗ timeElapsedFrom LastUpdate
● It’s hard to manually tweak the ranking ─ You must be an expert in the domain
─ … or a magician
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=𝟗𝟗.𝟗∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+𝟑.𝟏𝟏𝟏𝟒∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛+𝟒𝟐.𝟒𝟐∗𝑐𝑙𝑖𝑐𝑘𝑠 + 5 ∗ timeElapsedFrom LastUpdate
query = solr query = lucene query = austin query = bloomberg query = …
PROBLEM SETUP
It’s easier with Machine Learning ● 2,000+ parameters (non-linear, factorially larger than linear form)
● 8,000+ queries that are regularly tuned
● Early on we spent many days hand tuning…
SEARCH PIPELINE (ONLINE)
Index
Top-k retrieval
User Query
People
Commodities News
Other Sources
ReRanking Model
Top-k reranked
Top-x retrieval x >> k
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
TRAINING DATA: IMPLICIT VS EXPLICIT What is explicit data? ● A set of judges will assess the
search results manually given a query ─ Experts ─ Crowd
What is implicit data? ● Infer user preferences based on
user behavior ─ Aggregated results clicks ─ Query reformulation ─ Dwell time
Pros: ─ Data is very clean
Cons: ─ Can be very expensive!
Pros: ─ A lot of data!
Cons: ─ Extremely noisy
─ Privacy concerns
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
FEATURES ● A feature is an individual measurable property
● Given a query, and a collection we can produce many features for each document in the collection ─ If the query matches the title
─ Length of the document
─ Number of views
─ How old is it?
─ Can be visualized on a mobile device?
FEATURES Extract “features”
Was the result a cofounder? 0
Features are signals that give an indication of a result’s importance
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the document have an exec. position? 1
Query : APPL US
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 0
Does the document have an exec. position? 1
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 0
Does the document have an exec. position? 1
Popularity (%) 0.9
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 1
Does the document have an exec. position? 0
Popularity (%) 0.6
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
METRICS How do we know if our model is doing better? ● Offline metrics
─ Precision/Recall/F1 score
─ nDCG (Normalized Discount Cumulative Gain)
─ Other metrics (e.g., ERR, MAP, …)
● Online Metrics ─ Click through rates à higher
─ Time to first click à lower
─ Interleaving1
1O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Science, 30(1), 2012.
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
LEARNING TO RANK
● Learn how to combine the features for optimizing one or more metrics
● Many learning algorithms ─ RankSVM1
─ LambdaMART2
─ …
1T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002. 2C.J.C. Burges, "From RankNet to LambdaRank to LambdaMART: An Overview", Microsoft Research Technical Report MSR-TR-2010-82, 2010.
SEARCH PIPELINE: STANDARD
Index
Top-k retrieval
User Query
Solr People
Commodities News
Other Sources
SEARCH PIPELINE: STANDARD
Index
Top-k retrieval
User Query
Solr
Training Data
Learning Algorithm
Ranking Model Offline
People
Commodities News
Other Sources
SEARCH PIPELINE: STANDARD
Index
Top-k retrieval
User Query
Solr
Ranking Model Online Top-x
reranked
People
Commodities News
Other Sources
SEARCH PIPELINE: SOLR INTEGRATION
Index
Top-k retrieval
User Query
Solr
Ranking Model Online Top-x
reranked
People
Commodities News
Other Sources
SOLR RELEVANCY ● Pros
─ Simple and quick scoring computation
─ Phrase matching
─ Function query boosting on time, distance, popularity, etc
─ Customized fields for stemming, synonyms, etc
● Cons ─ Lots of manual time for creating a well tuned query
─ Weights are brittle, and may not be compatible in the future with more documents or fields added
LTR PLUGIN: GOALS ● Don’t tune the relevancy manually!
─ Uses machine learning to power automatic relevancy tuning
● Significant relevancy improvements
● Allow comparable scores across collections ─ Collections of different sizes
● Maintaining low latency ─ Re-use the vast Solr search functionality that is already built-in
─ Less data transport
● Makes it simple to use domain knowledge to rapidly create features ─ Features are no longer coded but rather scripted
STANDARD SOLR SEARCH REQUEST
Index
Top-k retrieval
User Query
People
Commodities News
Other Sources
Index
STANDARD SOLR SEARCH REQUEST
Index [10 Million]
Top-10 retrieval
User Query
Matches [10k]
Score [10k]
Solr Query
People
Commodities News
Other Sources
LTR SOLR SEARCH REQUEST
Index [10 Million]
Top-1000 retrieval
User Query
Matches [10k]
Score [10k]
Ranking Model
Top-10 reranked
Solr Query
LTR Query
People
Commodities News
Other Sources
<!-- Query parser used to rerank top docs with a provided model --> <queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" />
LTR PLUGIN: RERANKING
● LTRQuery extends Solr’s RankQuery ─ Wraps main query to fetch initial results ─ Returns custom TopDocsCollector for reranked ordered results
● Solr rerank request parameter rq={!ltr model=myModel1 reRankDocs=100 efi.user_query=‘james’ efi.my_var=123} ─ !ltr – name used in the solrconfig.xml for the LTRQParserPlugin ─ model – name of deployed model to use for reranking ─ reRankDocs – total number of documents to rerank ─ efi.* – custom parameters used to pass external feature information for your
features to use
• Query intent
• Personalization
SEARCH PIPELINE (ONLINE)
Index [10 Million]
Top-1000 retrieval
User Query
Matches [10k]
Score [10k]
Ranking Model
Top-10 reranked
Feature Extraction
People
Commodities News
Other Sources
{ "name": "Tim Cook", "primary_position": "ceo", "category ": "person", … }
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 0
Does the document have an exec. position? 1
Popularity (%) 0.9
LTR PLUGIN: FEATURES BEFORE
[ { "name": "isPersonAndExecutive", "type": "org.apache.solr.ltr.feature.impl.SolrFeature", "params": { "fq": [ "{!terms f=category}person", "{!terms f=primary_position}ceo, cto, cfo, president" ] } }, … ]
LTR PLUGIN: FEATURES AFTER
LTR PLUGIN: FUNCTION QUERIES [ { "name": "documentRecency", "type": "org.apache.solr.ltr.feature.impl.SolrFeature", "params": { "q": "{!func}recip( ms(NOW,publish_date), 3.16e-‐11, 1, 1)" } }, … ] 1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs dated 2 years ago, etc.. See http://wiki.apache.org/solr/FunctionQuery#Date_Boosting
LTR PLUGIN: FEATURE STORE ● FeatureStore is a Solr Managed Resource
─ REST API endpoint for performing CRUD operations on Solr objects
─ Stored in maintained in Zookeeper
● Deploy ─ curl -XPUT 'http://yoursolrserver/solr/collection/config/fstore'
--data-binary @./features.json -H 'Content-type:application/json'
● View ─ http://yoursolrserver/solr/collection/config/fstore
LTR PLUGIN: FEATURES ● Simplifies feature engineering through configuration file
● Utilizes rich search functionality built-in to Solr ─ Phrase matching
─ Synonyms, Stemming, etc
● Inherit the Feature class for specialized features
SEARCH PIPELINE (ONLINE)
Index [10 Million]
Top-1000 retrieval
User Query
Matches [10k]
Score [10k]
Ranking Model
Top-10 reranked
Feature Extraction
People
Commodities News
Other Sources
TRAINING PIPELINE (OFFLINE)
Index [10 Million]
Top-1000 retrieval
Training Queries
Matches [10k]
Score [10k]
Feature Extraction
Learning Algorithm
Ranking Model
People
Commodities News
Other Sources
{ "name": "Tim Cook", "primary_position": "ceo", "category ": "person", … }
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 0
Does the document have an exec. position? 1
Popularity (%) 0.9
<!-- Document transformer adding feature vectors with each retrieved document --> <transformer name="fv" class= "org.apache.solr.ltr.ranking.LTRFeatureTransformer" />
LTR PLUGIN: FEATURE EXTRACTION
● Feature extraction uses Solr’s TransformerFactory ─ Returns a custom field with each document
● fl = *, [fv] { "name": "Tim Cook", "primary_position": "ceo", "category ": "person", … "[fv]": "isCofounder:0.0, isPersonAndExecutive:1.0, matchTitle:0.0, popularity:0.9" }
LTR PLUGIN: MODEL { "type": "org.apache.solr.ltr.ranking.LambdaMARTModel", "name": "mymodel1", "features": [ { "name": "matchedTitle"}, { "name": "isPersonAndExecutive"} ], "params": { "trees": [ { "weight": 1, "tree": { "feature": "matchedTitle", "threshold": 0.5, "left": { "value": -‐100 }, "right": { "feature": "isPersonAndExecutive", "threshold": 0.5, "left": { "value": 50 }, "right": { "value": 75 } } } } ] } }
LTR PLUGIN: MODEL ● ModelStore is also a Solr Managed Resource
● Deploy ─ curl -XPUT 'http://yoursolrserver/solr/collection/config/mstore'
--data-binary @./model.json -H 'Content-type:application/json'
● View ─ http://yoursolrserver/solr/collection/config/mstore
● Inherit from the model class for new scoring algorithms ─ score()
─ explain()
LTR PLUGIN: EVALUATION ● Offline Metrics
─ nDCG increased approximately 10% after reranking
● Online Metrics ─ Clicks @ 1 up by approximately 10%
BEFORE AND AFTER Query: “unemployment” Solr Ranking Machine Learned Reranking
LTR PLUGIN: EVALUATION ● Offline Metrics
─ nDCG increased approximately 10% after reranking
● Online Metrics ─ Clicks @ 1 up by approximately 10%
● Performance ─ About 30% faster than previous external ranking system
10 million documents in collection 100k queries 1k features 1k documents/query reranked
LTR PLUGIN: BENEFITS ● Simpler feature engineering, without compiling
● Access to rich internal Solr search functionality for feature building
● Search result relevancy improvements vs regular Solr relevance
● Automatic relevancy tuning
● Compatible scores across collections
● Performance benefits vs external ranking system
FUTURE WORK ● Continue work to open source the plugin
● Support pipelining multiple reranking models
● Allow a simple ranking model to be used in the first pass
QUESTIONS?