Self-learned Relevancy with Apache Solr
-
Upload
trey-grainger -
Category
Software
-
view
267 -
download
2
Transcript of Self-learned Relevancy with Apache Solr
![Page 1: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/1.jpg)
Self-learned relevancy with Apache SolrTrey Grainger
SVP of Engineering, Lucidworks
NYC Lucene/Solr2017.03.30
![Page 2: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/2.jpg)
Trey GraingerSVP of Engineering
• Previously Director of Engineering @ CareerBuilder
• MBA, Management of Technology – Georgia Tech
• BA, Computer Science, Business, & Philosophy – Furman University
• Information Retrieval & Web Search - Stanford University
Other fun projects:
• Co-author of Solr in Action, plus numerous research papers
• Frequent conference speaker
• Founder of Celiaccess.com, the gluten-free search engine
• Lucene/Solr contributor
About Me
![Page 3: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/3.jpg)
• Apache Solr Overview
Lucidworks Fusion Overview
• Core Search / Relevancy
- Keyword Search
- Multi-lingual Text Analysis
- Relevancy
• Reflected Intelligence
- Signals (Demo)
- Recommendations (Demo)
- Relevancy Tuning
- Learning to Rank (Demo)
…
Agenda…
• Semantic Search
- Entity Extraction (Demo)
- Query Parsing (Demo)
- Semantic Knowledge Graph (Demo)
• Streaming Expressions
NYC Lucene/Solr
![Page 4: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/4.jpg)
Basic Keyword Search(inverted index, tf-idf, bm25, multilingual text analysis, query formulation, etc.)
Taxonomies / Entity Extraction(entity recognition, ontologies, synonyms, etc.)
Query Intent(query classification, semantic query parsing, concept expansion, rules, clustering, classification)
Relevancy Tuning(signals, AB testing/genetic algorithms, Learning to Rank, Neural Networks)
Self-learningData-driven App Sophistication
NYC Lucene/Solr
![Page 5: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/5.jpg)
what do you do?
![Page 6: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/6.jpg)
![Page 7: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/7.jpg)
Search-Driven Everything
Customer Service
Customer Insights
Fraud Surveillance
Research Portal
Online RetailDigital Content
![Page 8: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/8.jpg)
Lucidworks enables Search-Driven Everything
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations &
AlertsAnalytics & InsightsExtreme Relevancy
CUSTOMER
SERVICE
RESEARCH
PORTAL
DIGITAL
CONTENT
CUSTOMER
INSIGHTS
FRAUD
SURVEILLANCE
ONLINE
RETAIL
• Access all your data in a
number of ways from one
place.
• Secure storage and
processing from Solr and
Spark.
• Acquire data from any source
with pre-built connectors and
adapters.
Machine learning and
advanced analytics turn all
of your apps into intelligent
data-driven applications.
![Page 9: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/9.jpg)
Apache Solr
![Page 10: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/10.jpg)
“Solr is the popular, blazing-fast,
open source enterprise search
platform built on Apache Lucene™.”
![Page 11: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/11.jpg)
Key Solr Features:
● Multilingual Keyword search
● Relevancy Ranking of results
● Faceting & Analytics (nested / relational)
● Highlighting
● Spelling Correction
● Autocomplete/Type-ahead Prediction
● Sorting, Grouping, Deduplication
● Distributed, Fault-tolerant, Scalable
● Geospatial search
● Complex Function queries
● Recommendations (More Like This)
● Graph Queries and Traversals
● SQL Query Support
● Streaming Aggregations
● Batch and Streaming processing
● Highly Configurable / Plugins
● Learning to Rank
● Building machine-learning models
● … many more*source: Solr in Action, chapter 2
![Page 12: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/12.jpg)
The standard
for enterprise
search.of Fortune 500
uses Solr.
90%
![Page 13: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/13.jpg)
Lucidworks Fusion
![Page 14: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/14.jpg)
DFW Data Science
![Page 15: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/15.jpg)
![Page 16: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/16.jpg)
All Your Data
![Page 17: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/17.jpg)
• Over 50 connectors to
integrate all your data
• Robust parsing framework
to seamlessly ingest all your
document types
• Point and click Indexing
configuration and iterative
simulation of results for full
control over your ETL
process
• Your security model
enforced end-to-end from
ingest to search across your
different datasources
![Page 18: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/18.jpg)
Experience
Management
![Page 19: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/19.jpg)
• Relevancy tuning: Point-and-click
query pipeline configuration allow
fine-grained control of results.
• Machine-driven relevancy:
Signals aggregation learn and
automatically tune relevancy and
drive recommendations out of the
box .
• Powerful pipeline stages:
Customize fields, stages,
synonyms, boosts, facets,
machine learning models, your
own scripted behavior, and
dozens of other powerful search
stages.
• Turnkey search UI
(Lucidworks View): Build a
sophisticated end-to-end search
application in just hours.
![Page 20: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/20.jpg)
Operational Simplicity
![Page 21: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/21.jpg)
SECURITY BUILT-IN
Shards Shards
Apache Solr
Apache Zookeeper
ZK 1
Leader Election
Load Balancing
Shared Config Management
Worker Worker
Apache Spark
Cluster Manager
Core Services
• • •
NLP
Recommenders / Signals
Blob Storage
Pipelines
Scheduling
Alerting / Messaging
Connectors
RE
ST
AP
I
Admin UI
Lucidworks
View
LOGS FILE WEB DATABASE CLOUD
HD
FS
(O
ptio
na
l)
![Page 22: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/22.jpg)
• 75% decrease in
development time
• Licensing costs cut
by 50%
With Fusion’s out-of-the-box capabilities, we skipped
months in our dev cycle so we could focus our team
where they would have the most impact.
We cut our licensing costs by 50% and improved
application usability. The Lucidworks professional
services team amplified our success even further. We’re
all Fusion from here on out!”
“
Lourduraju Pamishetty
Senior IT Application Architect
—
![Page 23: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/23.jpg)
• Seamless integration of your
entire search & analytics
platform
• All capabilities exposed
through secured API's, so
you can use our UI or build
your own.
• End-to-end security policies
can be applied out of the
box to every aspect of your
search ecosystem.
• Distributed, fault-tolerant
scaling and supervision of
your entire search
application
![Page 24: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/24.jpg)
Core Services
• • •
NLP
Recommenders / Signals
Blob Storage
Pipelines
Scheduling
Alerting / Messaging
Connectors
RE
ST
AP
I
Admin UI
Lucidworks
View
LOGS FILE WEB DATABASE CLOUD
• Seamless integration of your
entire search & analytics
platform
• All capabilities exposed
through secured API's, so
you can use our UI or build
your own.
• End-to-end security policies
can be applied out of the
box to every aspect of your
search ecosystem.
• Distributed, fault-tolerant
scaling and supervision of
your entire search
application
![Page 25: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/25.jpg)
Lucidworks Fusion
![Page 26: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/26.jpg)
Fusion powers search for the brightest companies in the world.
![Page 27: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/27.jpg)
search & relevancy
![Page 28: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/28.jpg)
Basic Keyword Search
The beginning of a typical search journey
![Page 29: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/29.jpg)
Term Documents
a doc1 [2x]
brown doc3 [1x] , doc5 [1x]
cat doc4 [1x]
cow doc2 [1x] , doc5 [1x]
… ...
once doc1 [1x], doc5 [1x]
over doc2 [1x], doc3 [1x]
the doc2 [2x], doc3 [2x],
doc4[2x], doc5 [1x]
… …
Document Content Field
doc1 once upon a time, in a land far,
far away
doc2 the cow jumped over the moon.
doc3 the quick brown fox jumped over
the lazy dog.
doc4 the cat in the hat
doc5 The brown cow said “moo”
once.
… …
What you SEND to Lucene/Solr:How the content is INDEXED into Lucene/Solr (conceptually):
The inverted index
NYC Lucene/Solr
![Page 30: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/30.jpg)
/solr/select/?q=apache solr
Field Documents
… …
apache doc1, doc3, doc4,
doc5
…
hadoop doc2, doc4, doc6
… …
solr doc1, doc3, doc4,
doc7, doc8
… …
doc5
doc7 doc8
doc1 doc3 doc4
solr
apache
apache solr
Matching queries to documents
NYC Lucene/Solr
![Page 31: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/31.jpg)
Text Analysis
Generating terms to index from raw text
![Page 32: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/32.jpg)
Text Analysis in Solr
A text field in Lucene/Solr has an Analyzer containing:
① Zero or more CharFiltersTakes incoming text and “cleans it up” before it is tokenized
② One TokenizerSplits incoming text into a Token Stream containing Zero or more Tokens
③ Zero or more TokenFiltersExamines and optionally modifies each Token in the Token Stream
*From Solr in Action, Chapter 6
NYC Lucene/Solr
![Page 33: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/33.jpg)
A text field in Lucene/Solr has an Analyzer containing:
① Zero or more CharFiltersTakes incoming text and “cleans it up” before it is tokenized
② One TokenizerSplits incoming text into a Token Stream containing Zero or more Tokens
③ Zero or more TokenFiltersExamines and optionally modifies each Token in the Token Stream
Text Analysis in Solr
*From Solr in Action, Chapter 6
NYC Lucene/Solr
![Page 34: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/34.jpg)
A text field in Lucene/Solr has an Analyzer containing:
① Zero or more CharFiltersTakes incoming text and “cleans it up” before it is tokenized
② One TokenizerSplits incoming text into a Token Stream containing Zero or more Tokens
③ Zero or more TokenFiltersExamines and optionally modifies each Token in the Token Stream
Text Analysis in Solr
*From Solr in Action, Chapter 6
NYC Lucene/Solr
![Page 35: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/35.jpg)
A text field in Lucene/Solr has an Analyzer containing:
① Zero or more CharFiltersTakes incoming text and “cleans it up” before it is tokenized
② One TokenizerSplits incoming text into a Token Stream containing Zero or more Tokens
③ Zero or more TokenFiltersExamines and optionally modifies each Token in the Token Stream
Text Analysis in Solr
*From Solr in Action, Chapter 6
NYC Lucene/Solr
![Page 36: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/36.jpg)
Multi-lingual Text Analysis
Analyzing text across multiple languages
![Page 37: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/37.jpg)
Example English Analysis Chains
<fieldType name="text_en" class="solr.TextField"positionIncrementGap="100">
<analyzer><tokenizer class="solr.StandardTokenizerFactory"/><filter class="solr.StopFilterFactory"
words="lang/stopwords_en.txt”ignoreCase="true" />
<filter class="solr.LowerCaseFilterFactory"/><filter class="solr.EnglishPossessiveFilterFactory"/><filter class="solr.KeywordMarkerFilterFactory"
protected="lang/en_protwords.txt"/><filter class="solr.PorterStemFilterFactory"/>
</analyzer></fieldType>
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer><charFilter class="solr.HTMLStripCharFilterFactory"/><tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.SynonymFilterFactory"
synonyms="lang/en_synonyms.txt" IignoreCase="true" expand="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.ASCIIFoldingFilterFactory"/><filter class="solr.KStemFilterFactory"/><filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer></fieldType>
NYC Lucene/Solr
![Page 38: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/38.jpg)
Per-language Analysis Chains
*Some of the 32 different languages configurations in Appendix B of Solr in Action
NYC Lucene/Solr
![Page 39: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/39.jpg)
Per-language Analysis Chains
*Some of the 32 different languages configurations in Appendix B of Solr in Action
NYC Lucene/Solr
![Page 40: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/40.jpg)
Which Stemmer do I choose?
*From Solr in Action, Chapter 14
NYC Lucene/Solr
![Page 41: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/41.jpg)
Common English Stemmers
*From Solr in Action, Chapter 14
NYC Lucene/Solr
![Page 42: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/42.jpg)
When Stemming goes awry
Fixing Stemming Mistakes:
• Unfortunately, every stemmer will have problem-cases that aren’t handled as you would expect
• Thankfully, Stemmers can be overriden
• KeywordMarkerFilter: protects a list of terms you specify from being stemmed
• StemmerOverrideFilter: applies a list of custom term mappings you specify
Alternate strategy:
• Use Lemmatization (root-form analysis) instead of Stemming
• Commercial vendors help tremendously in this space
• The Hunspell stemmer enables dictionary-based support of varying quality in over 100 languages
NYC Lucene/Solr
![Page 43: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/43.jpg)
Relevancy
Scoring the results, returning the best matches
![Page 44: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/44.jpg)
Classic Lucene Relevancy Algorithm (now switched to BM25):
*Source: Solr in Action, chapter 3
Score(q, d) =
∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t, d) ) · coord(q, d) · queryNorm(q)t in q
Where:t = term; d = document; q = query; f = field
tf(t in d) = numTermOccurrencesInDocument ½
idf(t) = 1 + log (numDocs / (docFreq + 1))
coord(q, d) = numTermsInDocumentFromQuery / numTermsInQuery
queryNorm(q) = 1 / (sumOfSquaredWeights ½ )
sumOfSquaredWeights = q.getBoost()2 · ∑ (idf(t) · t.getBoost() )2
t in q
norm(t, d) = d.getBoost() · lengthNorm(f) · f.getBoost()
NYC Lucene/Solr
![Page 45: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/45.jpg)
• Term Frequency: “How well a term describes a document?”
– Measure: how often a term occurs per document
• Inverse Document Frequency: “How important is a term overall?”
– Measure: how rare the term is across all documents
TF * IDF
*Source: Solr in Action, chapter 3
NYC Lucene/Solr
![Page 46: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/46.jpg)
News Search : popularity and freshness drive relevance
Restaurant Search: geographical proximity and price range are critical
Ecommerce: likelihood of a purchase is key
Movie search: More popular titles are generally more relevant
Job search: category of job, salary range, and geographical proximity matter
TF * IDF of keywords can’t hold it’s own against good
domain-specific relevance factors!
That’s great, but what about domain-specific knowledge?
NYC Lucene/Solr
![Page 47: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/47.jpg)
what is “reflected intelligence”?
![Page 48: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/48.jpg)
The Three C’s
Content:Keywords and other features in your documents
Collaboration:How other’s have chosen to interact with your system
Context:Available information about your users and their intent
Reflected Intelligence“Leveraging previous data and interactions to improve how
new data and interactions should be interpreted”
NYC Lucene/Solr
![Page 49: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/49.jpg)
Feedback Loops
User
Searches
User
Sees
ResultsUser
takes an
action
Users’ actions
inform system
improvements
NYC Lucene/Solr
![Page 50: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/50.jpg)
● Recommendation Algorithms
● Building user profiles from past searches, clicks, and other actions
● Identifying correlations between keywords/phrases
● Building out automatically-generated ontologies from content and queries
● Determining relevancy judgements (precision, recall, nDCG, etc.) from click
logs
● Learning to Rank - using relevancy judgements and machine learning to train
a relevance model
● Discovering misspellings, synonyms, acronyms, and related keywords
● Disambiguation of keyword phrases with multiple meanings
● Learning what’s important in your content
Examples of Reflected Intelligence
NYC Lucene/Solr
![Page 51: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/51.jpg)
John lives in Boston but wants to move to New York or possibly another big city. He is
currently a sales manager but wants to move towards business development.
Irene is a bartender in Dublin and is only interested in jobs within 10KM of her location
in the food service industry.
Irfan is a software engineer in Atlanta and is interested in software engineering jobs at a
Big Data company. He is happy to move across the U.S. for the right job.
Jane is a nurse educator in Boston seeking between $40K and $60K
*Example from chapter 16 of Solr in Action
Consider what you know about users
NYC Lucene/Solr
![Page 52: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/52.jpg)
http://localhost:8983/solr/jobs/select/?
fl=jobtitle,city,state,salary&
q=(
jobtitle:"nurse educator"^25 OR jobtitle:(nurse educator)^10
)
AND (
(city:"Boston" AND state:"MA")^15
OR state:"MA")
AND _val_:"map(salary, 40000, 60000,10, 0)”
*Example from chapter 16 of Solr in Action
Query for Jane
Jane is a nurse educator in Boston seeking between $40K and $60K
NYC Lucene/Solr
![Page 53: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/53.jpg)
{ ...
"response":{"numFound":22,"start":0,"docs":[
{"jobtitle":" Clinical Educator
(New England/ Boston)",
"city":"Boston",
"state":"MA",
"salary":41503},
…]}}
*Example documents available @ http://github.com/treygrainger/solr-in-action
Search Results for Jane
{"jobtitle":"Nurse Educator",
"city":"Braintree",
"state":"MA",
"salary":56183},
{"jobtitle":"Nurse Educator",
"city":"Brighton",
"state":"MA",
"salary":71359}
NYC Lucene/Solr
![Page 54: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/54.jpg)
You just built a
recommendation engine!
![Page 55: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/55.jpg)
NYC Lucene/Solr
Can also integrate user behavior (Ships with Fusion
3.1):
![Page 56: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/56.jpg)
Demo:
Signals & Recommendations
![Page 57: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/57.jpg)
![Page 58: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/58.jpg)
• 200%+ increase in
click-through rates
• 91% lower TCO
• Fewer support tickets
• Increased customer
satisfaction
![Page 59: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/59.jpg)
Relevancy Tuning
Improving ranking algorithms through experiments and models
![Page 60: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/60.jpg)
How to Measure Relevancy?
A B C
Retrieved
Documents
Related
Documents
Precision = B/A
Recall = B/C
Problem:
Assume Prec = 90% and Rec = 100% but assume the 10% irrelevant documents were ranked at
the top of the retrieved documents, is that OK?
NYC Lucene/Solr
![Page 61: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/61.jpg)
Normalized Discounted Cumulative Gain
Rank Relevancy
3 0.95
1 0.70
2 0.60
4 0.45
Rank Relevancy
1 0.95
2 0.85
3 0.80
4 0.65
Ranking
IdealGiven
• Position is
considered in
quantifying
relevancy.
• Labeled dataset
is required.
NYC Lucene/Solr
![Page 62: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/62.jpg)
Learning to Rank
![Page 63: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/63.jpg)
Learning to Rank (LTR)
● It applies machine learning techniques to discover the best combination
of features that provide best ranking.
● It requires labeled set of documents with relevancy scores for given set
of queries
● Features used for ranking are usually more computationally expensive
than the ones used for matching
● It typically re-ranks a subset of the matched documents (e.g. top 1000)
NYC Lucene/Solr
![Page 64: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/64.jpg)
NYC Lucene/Solr
![Page 65: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/65.jpg)
Common LTR Algorithms
• RankNet* (Neural Network, boosted trees)
• LambdaMart* (set of regression trees)
• SVM Rank** (SVM classifier)
** http://research.microsoft.com/en-us/people/hangli/cao-et-al-sigir2006.pdf
* http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf
NYC Lucene/Solr
![Page 66: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/66.jpg)
LambdaMart Example
Source: T. Grainger, K. AlJadda. ”Reflected Intelligence: Evolving self-learning data systems". Georgia Tech, 2016
NYC Lucene/Solr
![Page 67: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/67.jpg)
Demo: Learning to Rank
![Page 68: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/68.jpg)
Obtaining Relevancy JudgementsTypical Methodologies
1) Hire employees, contractors, or interns
-Pros:
Accuracy
-Cons:
Expensive
Not scalable (cost or man-power-wise)
Data Becomes Stale
2) Crowdsource-Pros:
Less cost, more scalable
-Cons:
Less accurate
Data still becomes staleSource: T. Grainger, K. AlJadda. ”Reflected Intelligence: Evolving self-learning data systems". Georgia Tech, 2016
NYC Lucene/Solr
![Page 69: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/69.jpg)
Reflected Intelligence: Possible to infer relevancy judgements?
Rank Document ID
1 Doc1
2 Doc2
3 Doc3
4 Doc4
QueryQuery
Doc1 Doc2 Doc3
01 1
Query
Doc1 Doc2 Doc3
10 0
Source: T. Grainger, K. AlJadda. ”Reflected Intelligence: Evolving self-learning data systems". Georgia Tech, 2016
NYC Lucene/Solr
![Page 70: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/70.jpg)
Automated Relevancy Benchmarking
DefaultAlgorithm
0.610.59
0.580.60
0.61 0.610.60
0.610.60
0.750.74
0.750.74
0.750.73
0.750.76
0.750.74
0.79 0.790.78
0.790.80
0.810.80
0.810.79 0.79
0.700.71 0.71
0.690.70 0.70
0.690.70
0.710.70
0.750.76
0.770.76 0.76
0.770.76
0.750.76 0.76
0.300.31
0.320.33
0.320.30
0.31 0.31 0.310.32
10/1/16 10/2/16 10/3/16 10/4/16 10/5/16 10/6/16 10/7/16 10/8/16 10/9/16 10/10/16
DefaultAlgorithm Algorithm1 Algorithm2 Algorithm3 Algorithm4 Algorithm5
NYC Lucene/Solr
![Page 71: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/71.jpg)
Traditional
Keyword
SearchRecommendations
Semantic
Search
User Intent
Personalized
Search
Augmented
SearchDomain-aware
Matching
The Relevancy
Spectrum
NYC Lucene/Solr
![Page 72: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/72.jpg)
semantic search
![Page 73: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/73.jpg)
NYC Lucene/Solr
![Page 74: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/74.jpg)
Building a Taxonomy of Entities
Many ways to generate this:• Topic Modelling
• Clustering of documents
• Statistical Analysis of interesting phrases
- Word2Vec / Glove / Dice Conceptual Search
• Buy a dictionary (often doesn’t work for
domain-specific search problems)
• Generate a model of domain-specific phrases by mining query logs for commonly searched phrases within the domain*
* K. Aljadda, M. Korayem, T. Grainger, C. Russell. "Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific Jargon," in IEEE Big Data 2014.
NYC Lucene/Solr
![Page 75: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/75.jpg)
NYC Lucene/Solr
![Page 76: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/76.jpg)
NYC Lucene/Solr
![Page 77: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/77.jpg)
entity extraction
![Page 78: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/78.jpg)
NYC Lucene/Solr
![Page 79: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/79.jpg)
Demo: Solr Text Tagger
![Page 80: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/80.jpg)
semantic query parsing
![Page 81: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/81.jpg)
NYC Lucene/Solr
![Page 82: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/82.jpg)
Probabilistic Query Parser
Goal: given a query, predict which
combinations of keywords should be
combined together as phrases
Example:
senior java developer hadoop
Possible Parsings:senior, java, developer, hadoop
"senior java", developer, hadoop
"senior java developer", hadoop
"senior java developer hadoop”
"senior java", "developer hadoop”
senior, "java developer", hadoop
senior, java, "developer hadoop" Source: Trey Grainger, “Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disambiguation”, Bay Area Search Meetup, November 2015.
NYC Lucene/Solr
![Page 83: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/83.jpg)
Demo: Probabilistic Query Parser
![Page 84: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/84.jpg)
Semantic Query Parsing
Identification of phrases in queries using two steps:
1) Check a dictionary of known terms that is continuously
built, cleaned, and refined based upon common inputs from
interactions with real users of the system. The SolrTextTagger
works well for this.*
2) Also invoke a probabilistic query parser to dynamically
identify unknown phrases using statistics from a corpus of data
(language model)
*K. Aljadda, M. Korayem, T. Grainger, C. Russell. "Crowdsourced Query Augmentation
through Semantic Discovery of Domain-specific Jargon," in IEEE Big Data 2014.
NYC Lucene/Solr
![Page 85: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/85.jpg)
query augmentation
![Page 86: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/86.jpg)
NYC Lucene/Solr
![Page 87: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/87.jpg)
Knowledge Graph
Semantic Data Encoded into Free Text Content
e en eng engi engineer engineers
engineer engineersNodeType:Term
softwareengineer
softwareengineers
electricalengineering
engineer
engineering software
…
…
…
NodeType:
CharacterSequence
NodeType:
TermSequence
NodeType:
Document
id:1
text:lookingforasoftwareengineerwithdegreeincomputerscienceorelectricalengineering
id:2
text:applytobeasoftwareengineerandworkwithothergreatsoftwareengineers
id:3
text:startagreatcareerinelectricalengineering
…
…
NYC Lucene/Solr
![Page 88: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/88.jpg)
id: 1job_title: Software Engineerdesc: software engineer at a great companyskills: .Net, C#, java
id: 2job_title: Registered Nursedesc: a registered nurse at hospital doing hard workskills: oncology, phlebotemy
id: 3job_title: Java Developerdesc: a software engineer or a java engineer doing workskills: java, scala, hibernate
field term postings list
doc pos
desc
a
1 4
2 1
3 1, 5
at1 3
2 4
company 1 6
doing2 6
3 8
engineer1 2
3 3, 7
great 1 5
hard 2 7
hospital 2 5
java 3 6
nurse 2 3
or 3 4
registered 2 2
software1 1
3 2
work2 10
3 9
job_title java developer 3 1
… … … …
field doc term
desc
1a
at
company
engineer
great
software
2a
at
doing
hard
hospital
nurse
registered
work
3a
doing
engineer
java
or
software
work
job_title 1Software Engineer
… … …
Terms-Docs Inverted IndexDocs-Terms Forward IndexDocuments
Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“TheSemantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016.
Knowledge Graph
NYC Lucene/Solr
![Page 89: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/89.jpg)
Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“TheSemantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016.
Knowledge Graph
Set-theory View
Graph View
How the Graph Traversal Works
skill: Java
skill: Scala
skill: Hibernate
skill: Oncology
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
skill: Java
skill: Java
skill: Scala
skill: Hibernate
skill: Oncology
Data Structure View
Java
Scala Hibernate
docs1, 2, 6
docs 3, 4
Oncology
doc 5
NYC Lucene/Solr
![Page 90: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/90.jpg)
Knowledge Graph
Graph Model
Structure:
Single-level Traversal / Scoring:
Multi-level Traversal / Scoring:
![Page 91: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/91.jpg)
Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“TheSemantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016.
Knowledge Graph
Multi-level Traversal
Data Structure View
Graph View
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
skill: Java
skill: Java
skill: Scala
skill: Hibernate
skill: Oncology
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
job_title: Software Engineer
job_title: Data
Scientist
job_title: Java
Developer
……
Inverted Index Lookup
Forward Index Lookup
Forward Index Lookup
Inverted Index Lookup
Java
Java Developer
Hibernate
Scala
Software Engineer
Data Scientist
ha
s_re
late
d_job_title
ha
s_re
late
d_job_title
NYC Lucene/Solr
![Page 92: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/92.jpg)
Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016.
Knowledge Graph
Scoring nodes in the Graph
Foreground vs. Background AnalysisEvery term scored against it’s context. The more commonly the term appears within it’s foreground context versus its background context, the more relevant it is to the specified foreground context.
countFG(x) - totalDocsFG * probBG(x)
z = --------------------------------------------------------
sqrt(totalDocsFG * probBG(x) * (1 - probBG(x)))
{ "type":"keywords”, "values":[
{ "value":"hive", "relatedness": 0.9765, "popularity":369 },
{ "value":"spark", "relatedness": 0.9634, "popularity":15653 },
{ "value":".net", "relatedness": 0.5417, "popularity":17683 },
{ "value":"bogus_word", "relatedness": 0.0, "popularity":0 },
{ "value":"teaching", "relatedness": -0.1510, "popularity":9923 },
{ "value":"CPR", "relatedness": -0.4012, "popularity":27089 } ] }
+-
Foreground Query: "Hadoop"
NYC Lucene/Solr
![Page 93: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/93.jpg)
Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“TheSemantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016.
Knowledge Graph
Multi-level Graph Traversal with Scores
software engineer*(materialized node)
Java
C#
.NET
.NET Developer
Java Developer
Hibernate
ScalaVB.NET
Software Engineer
Data Scientist
SkillNodes
has_related_skillStartingNode
SkillNodes
has_related_skill Job TitleNodes
has_related_job_title
0.900.88 0.93
0.93
0.34
0.74
0.91
0.89
0.74
0.89
0.780.72
0.48
0.93
0.76
0.83
0.80
0.64
0.61
0.780.55
NYC Lucene/Solr
![Page 94: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/94.jpg)
Knowledge Graph
Use Case: Document Summarization
Experiment: Pass in raw text (extracting phrases as needed), and rank their similarity to the documents using the SKG.
Additionally, can traverse the graph to “related” entities/keyword phrases NOT found in the original document
Applications: Content-based and multi-modal recommendations (no cold-start problem), data cleansing prior to clustering or other ML methods, semantic search / similarity scoring
![Page 95: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/95.jpg)
Demo: Semantic Knowledge Graph
![Page 96: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/96.jpg)
Knowledge Graph
NYC Lucene/Solr
![Page 97: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/97.jpg)
Knowledge Graph
NYC Lucene/Solr
![Page 98: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/98.jpg)
NYC Lucene/Solr
![Page 99: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/99.jpg)
streaming expressions
![Page 100: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/100.jpg)
• Perform relational operations on
streams
• Stream sources: search, jdbc, facets,
features, gatherNodes, shortestPath,
train, features, model, random, stats,
topic
• Stream decorators: classify, commit,
complement, daemon, executor, fetch,
having, leftOuterJoin, hashJoin,
innerJoin, intersect, merge, null,
outerHashJoin, parallel, priority,
reduce, rollup, scoreNodes, select,
sort, top, unique, update
Streaming Expressions
Source: “Solr 6 Deep Dive: SQL and Graph”. Grant Ingersoll & Tim Potter, 2016.
NYC Lucene/Solr
![Page 101: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/101.jpg)
Streaming Expressions - Examples
Shortest-path Graph
Traversal
Parallel Batch
Procesing
Train a Logistic Regression
Model
Distributed Joins
Rapid Export of all
Search Results
Pull Results from External Database
Sources: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html
Classifying
Search Results
![Page 102: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/102.jpg)
Additional References:
Southern Data Science
![Page 103: Self-learned Relevancy with Apache Solr](https://reader035.fdocuments.us/reader035/viewer/2022081323/5a6ed3e77f8b9a42298b5871/html5/thumbnails/103.jpg)
Contact Info
Trey [email protected]@treygrainger
http://solrinaction.comMeetup discount (39% off): 39grainger
Other presentations: http://www.treygrainger.com
NYC Lucene/Solr