Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by Simon Hughes, Dice.com
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com
-
Upload
lucidworks -
Category
Technology
-
view
285 -
download
0
Transcript of Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com
![Page 1: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/1.jpg)
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
![Page 2: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/2.jpg)
Evolving The Optimal Relevancy Scoring Model at Dice.com Simon Hughes
Chief Data Scientist, Dice.com
![Page 3: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/3.jpg)
3
• Chief Data Scientist at Dice.com and DHI, under Yuri Bykov
• Dice.com – leading US job board for IT professionals
• Twitter handle: https://twitter.com/hughes_meister
Who Am I?
• Dice Skills pages - http://www.dice.com/skills
• New Dice Careers Mobile App
Key Projects
• PhD candidate at DePaul University, studying NLP and machine learning
• Thesis topic – Detecting causality in scientific explanatory essays
PhD
![Page 4: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/4.jpg)
4
• Look under https://github.com/DiceTechJobs
• Set of Solr plugins https://github.com/DiceTechJobs/SolrPlugins
• Tutorial for this talk: https://github.com/DiceTechJobs/RelevancyTuning
Open Source GitHub Repositories
![Page 5: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/5.jpg)
5
1. Approaches to Relevancy Tuning
2. Automated Relevancy Tuning – using Reinforcement Learning
3. Feedback Loops - Dangers of Closed Loop Learning Systems
Overview
![Page 6: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/6.jpg)
6
• Last year I talked about conceptual search and how that could be used to improve recall
• This year I want to focus on techniques to improve precision
• Novelty
Motivations for Talk
![Page 7: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/7.jpg)
7
Finding the Optimal Search Engine Configuration
• Most companies initially approach this in a very ad hoc and manual process:
• Follow ‘best practices’ and make some initial educated guesses as to the best settings
• Manually tune the parameters on a number of key user queries
• The search engine parameters should be tuned to reflect how your users search
• Relevancy is a hard to define concept, but it’s what your users consider provides them with an
optimal search experience. So it should be informed by their search behavior
Relevancy Tuning
![Page 8: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/8.jpg)
8
What Solr Configuration Options Influence Relevancy?
Solr and Lucene provide many configuration options that impact search relevancy, including:
• Which query parser – dismax, edismax, LuceneParser, etc • Field boosts – qf parameter • Phrase boosts – pf, pf2, pf3 parameters • Minimum should match - mm parameter • Similarity Class – default similarity, BM25, Tf.Idf, custom or one of many others • Boost queries – boost, bf, bq, etc • Edismax tie parameter – recommended value ≈ 0.1
![Page 9: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/9.jpg)
9
Remove Noise Chars
• Ensure punctuation characters and plurality are removed from each field using the analysis chain
Ø ‘q=developer’ should match ‘developer,’ ,’developer.’, ‘developer’s’ and ‘developers’
When using Stemming \ Synonyms – use Copy Fields + Edismax
• Use copy fields to apply stemming and synonyms to existing fields
• Allows different boosts to be applied to stemmed and synonym matches
• Set fields boost to be lower on the stemmed and synonym copy fields
Some General Tips on Relevancy Tuning
![Page 10: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/10.jpg)
10
Use Boost Queries for Specific Query Use Cases
• Edismax bq parameter – allows boosting of matches to nested queries
• See chapter 7 of Relevant Search - good coverage of this strategy
Make Good Use of Phrase Query Boosts
• Use pf, pf2 and pf3 parameters in edismax to give preference for multi-term matches
• pf2 and pf3 often give better performance than pf, which requires an exact match for all query terms
Caveat Emptor: Monitor impact of these changes on query performance (QTime) and index size
Some General Tips on Relevancy Tuning
![Page 11: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/11.jpg)
11
• To tune your search parameters, you can gather a dataset of relevancy judgements
• For a set of important queries, the dataset will contain a set of relevancy judgements with the
top results returned annotated for relevancy
• This dataset can be collected using domain experts and a user interface designed for this task
• Commercial Examples:
• Quepid – developed by OpenSource Connections
• Fusion UI Relevancy Workbench – part of the Fusion offering from Lucidworks
The ‘Golden’ Test Collection
![Page 12: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/12.jpg)
12
![Page 13: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/13.jpg)
13
• An alternative to manually collecting relevancy judgements is to collect them directly from your users
• For each user search on the site, capture:
• User’s query, and timestamp
• Any filters applied
• Result impressions and clicks
• You can then turn this into a test collection by assuming that the results that people click on are more relevant
than those they don’t
• The time spent on the results page is also a great indication of how relevant that result was to the original search
Search Log Capture
![Page 14: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/14.jpg)
14
• Now you have a test collection, you can use that to tune your search engine configuration
• Using the test collection, you can measure the relevancy of a set of searches on that collection using some IR metrics, such as:
• MAP (Mean Average Precision)
• Precision at K (compute precision at the k’th document retrieved)
• NDCG (Normalized Discounted Cumulative Gain)
• Regression testing – this allows you to build a set of regression tests to ensure configuration changes both improve relevancy
and don’t break certain queries
• Manually tuning search configurations is still a time consuming and inefficient process
• Is there a better way?
Relevancy Tuning with a Test Collection
![Page 15: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/15.jpg)
15
1. Supervised Machine Learning?
• No - cannot optimize your search configuration without a computable gradient
2. Grid Search?
• Perform a brute force search over a the range of possible configuration parameters
• Very slow and inefficient – is not able to learn which ranges of settings work best
3. Black Box Optimization Algorithms?
• Optimization algorithms exist that attempt to find the optimum value of an unknown function in as few iterations as possible
• Perform a much smarter search of the parameter space than grid search
Automated Relevancy Tuning Approaches
![Page 16: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/16.jpg)
16
• Use an optimization algorithm to optimize a ‘black box’ function
• Black box function – provide the optimization algorithm with a function that takes a set of parameters as inputs
and computes a score
• The black box algorithm will then try and choose parameter settings to optimize the score
• This can be thought of as a form of reinforcement learning
• These algorithms will intelligently search the space of possible search configurations to arrive at a solution
• Example algorithms include Bayesian Optimization, Simulated Annealing, and Genetic Algorithms (hence talk
title)
Black Box Optimization Algorithms
![Page 17: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/17.jpg)
17
Example Black Box Function for Search Relevancy
![Page 18: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/18.jpg)
18
• There are some excellent mature libraries for doing this sort of thing e.g.
• DEAP
- Distributed Evolutionary Algorithms in Python (hence talk title)
• Scikit Optimize
– General optimization library built by a team at CERN headed by Tim Head
• These libraries are very easy to use, however getting them to optimize your search configuration is a little trickier
• They tend to work better when optimizing a small set of parameters at a time – 1 to 4 works well
• Achieved an improvement of 5% in MAP @ 5 for our MLT configuration. A\B testing changes to search before
EOY
Making it Work
![Page 19: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/19.jpg)
19
• To optimize a large set of search parameters – start with the most important ones and optimize those while
keeping the rest fixed
• If you are using search logs to optimize the search configuration, use a large number of searches (at least a few
thousand) to ensure you are performing a robust enough test
• For most search collections of a reasonable size, running these optimizations over your search collection will
take time – set it up on a server, parallelize where possible and leave running overnight
• Typically you will want to allow the algorithm to try a few hundred variations of each parameter set at least to
find a good range of settings
• Ideally – first optimize your search configuration against a set of relevancy judgements acquired from domain
experts, deploy to production and use the search logs to further tune against your users search behavior
Making it Work
![Page 20: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/20.jpg)
20
• As with any machine learning problem, it is essential to use one dataset to learn from, and a second separate dataset to
validate your results – prevents ‘overfitting’
• Overfitting in this context means that the search parameters are over-tuned on your initial dataset, that the search engine
performs worse on new data than with the current configuration
• Once you have an optimal set of configuration parameters, that you are happy with, these should be evaluated on a second set
of relevancy judgements to ensure the same performance gains are seen there also
• This applies to both manual and automatic tuning of the search engine configuration. Humans can overfit a dataset just as
easily as an algorithm can
Use a Separate Testing Dataset to Validate Improvements
![Page 21: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/21.jpg)
21
• Auto-tune other solr parameters – phrase slop, mm settings, similarity class used
• Your can evolve a more optimal ranking function:
• Either tweak the settings of the existing ranking functions (see
SweetSpotSimilarityFactory class)
• Or use Genetic Programming to evolve a better ranking function for your dataset
• Genetic Programming is an evolutionary algorithm that can evolve programs and equations
• Some relevant papers, good introductory paper (but not very recent)
Some Other Things to Try
![Page 22: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/22.jpg)
22
• Building a Machine Learned Ranking system is a premature optimization if you haven’t first optimized
your search configuration
• Relevancy tuning and MLR both primarily optimize for precision over recall due to nature of training
data**
• For techniques to improve recall, see conceptual \ semantic search:
• Simon Hughes - “Conceptual Search” (Revolution 2015)
• Trey Grainger - “Enhancing Relevancy Through Personalization and Semantic Search” (Revolution 2013)
• Doug Turnbull and John Berryman - Chapter 11 of Relevant Search
Things to Consider
![Page 23: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/23.jpg)
Feedback Loops – Dangers of Closed Loop Learning Systems
![Page 24: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/24.jpg)
Users Interact with
the System Model Machine Learning
Produce
Building a Machine Learning System
1. Users interact with the system to
produce data
2. Machine learning algorithms turns
that data into a model
What happens if the model’s
predictions influence the user’s
behavior?
![Page 25: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/25.jpg)
Users Interact with
the System Model
Produce
Positive Feedback Loop
1. Users interact with the system to
produce data
2. Machine learning algorithms turns
that data into a model
3. Model changes user behavior,
modifying its own future training
data
Model changes behavior
Machine Learning
![Page 26: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/26.jpg)
26
1. Isolate a subset of data from being influenced by the model, use this data to train the system
• E.g. leave a small proportion of user searches un-ranked by the MLR model
• E.g. generate a subset of recommendations at random, or by using an unsupervised model
2. Use a reinforcement learning model instead (such as a multi-armed bandit) - the system will
dynamically adapt to the users’ behavior, balancing exploring different hypotheses with
exploiting what it’s learned to produce accurate predictions
Preventing Positive Feedback Loops
![Page 27: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/27.jpg)
27
THE END
• Thank you for listening
• Any questions?
![Page 28: Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon Hughes, Dice.com](https://reader031.fdocuments.us/reader031/viewer/2022030318/58ef37f71a28ab1f4e8b4667/html5/thumbnails/28.jpg)
28