Webinar: Natural Language Search with Solr
-
Upload
lucidworks -
Category
Technology
-
view
4.544 -
download
0
Transcript of Webinar: Natural Language Search with Solr
Ted Sullivan
Natural Language Search with Solr
lucidworks.com
Senior Solutions Architect
The take-home word for this talk is:
CONTEXT
What I will talk about …Why does context matter?
Phrase and contextual ambiguities in search
• Recent advances in Query Autofiltering that attack the context problem by adding “verb/preposition” disambiguation *
Traditional ways of visualizing context in search - forging search “loops”
• Facets
• Typeahead
https://lucidworks.com/blog/2015/11/19/query-autofiltering-chapter-4-a-novel-approach-to-natural-language-processing/*
Adding metadata context to Suggestions using Facets
Using Pivot Facets to create semantically rich suggestions
Facets to bring user-centric context to suggestions
• Entitlements: Security trimming of suggestions
• User session context: Dynamic On-The-Fly Predictive Analytics!
What I will talk about …
Why Does Context Matter?
Relevance is contextual - relevant to whom under what circumstances?
Language / User Intent / Social and business factors
Ambiguities in search are often due to an failure/inability to detect context.
So, what can we do about this - or is this talk just some obvious hand-waving BS that we’ve heard a thousand times? Hopefully, not.
But that said - maybe just a little theory first …
Contextual RelationshipsSemantic Context - Language, Lexicon
User Context - Intent, Agendas,Permissions, Demographics, Location
Social Context - Popularity, Common Behaviors => Recommendations
Business Context - Rules, Organization, Domain, Security
Context == Relationships
Within and between metadata “objects”
Search is an exchange of one metadata object - the query - for others - the results.
Things are related to other ThingsRelationships provide context
• Static or known Relationships - defined by a knowledge graph such as an Ontology
• Discovered Relationships - computed by data mining
Knowledge Graphs - connected-ness
Usage Logs (query logs, other captured events or signals) - behavioral contexts
Clustering - unsupervised learning algorithms
Natural Language Processing - semantic contexts - noun phrases - statements
Machine Learning - supervised learning => Feature extraction
Apple
Tim Cook
Times Square
Granny Smith
White Album
iPhone Macintosh Computer Tablet Steve Jobs Lisa iTunes
Broadway Wall Street Empire State Building Bronx Zoo
Pie Fritters Season Sauce Cider Picking Tree McIntosh
Records Beatles George Martin Capitol White Album
Feature Sets
Resolving AmbiguitiesPhrase or syntactic ambiguities - detecting nouns
Autophrasing - unstructured data
Query Autofiltering - structured data
Contextual or semantic ambiguities (subject-verb-object) - detecting intent
Traditional NLP - POS detection, Machine Learning
Query Autofiltering with verb/preposition disambiguation
Song
Songwriter
Genre
Performer
Recording
Guitarist
Pianist
VocalistProducerRecord Label
Band
Album
Enough abstractions - give me some examples!
Music Ontology
Discovery and Focus
Enough abstractions - give me some examples!Medical Ontology
Disease
Condition Symptom
DrugTreatment
Query Autofiltering “songs Eric Clapton wrote” vs. “songs Eric Clapton performed”
Without Verb support get:
(performer_ss:”Eric Clapton” OR composer_ss:”Eric Clapton”) AND composition_type:Song
For either.
With Verb support
Now we get:
songs Eric Clapton wrote => composer_ss:”Eric Clapton” AND composition_type:Song
songs Eric Clapton performed => performer_ss:”Eric Claptpn” AND composition_type:Song
Verb/Preposition context rules
written,wrote,composed =>composer_ssperformed,played,sang,recorded:performer_ss
Query Autofiltering “Bands that Eric Clapton was in”
No context rules (raw autofiltering):
((name_s:Band OR musician_type_ss:Band) AND (name_s:\"Eric Clapton\" OR original_performer_s:\"Eric Clapton\" OR composer_ss:\"Eric Clapton\" OR performer_ss:\"Eric Clapton\" OR groupMembers_ss:\"Eric Clapton\”))
Add context rule
members,member,was in,is in,who's in,who's in the,is in the,was in the => memberOfGroup_ss,groupMembers_ss
((name_s:Band OR musician_type_ss:Band) AND groupMembers_ss:\"Eric Clapton\")
Verb/Preposition context rules
Query Autofiltering Verb/Preposition context rulesWho’s in The Who
raw autofiltering
((name_s:\"The Who\" OR original_performer_s:\"The Who\" OR performer_ss:\"The Who\" OR memberOfGroup_ss:\"The Who\”))
Query Autofiltering Verb/Preposition context rulesWho’s in The Who
raw autofiltering
((name_s:\"The Who\" OR original_performer_s:\"The Who\" OR performer_ss:\"The Who\" OR memberOfGroup_ss:\"The Who\”))
with context rule
members,member,was in,is in,who's in,who's in the,is in the,was in the => memberOfGroup_ss,groupMembers_ss
query is now:
(memberOfGroup_ss:\"The Who\")
Query Autofiltering
Drugs that treat abdominal pain
treatment_type:Drug AND has_indication:”abdominal pain”
Drugs that cause abdominal pain
treatment_type:Drug AND has_side_effect:”abdominal pain”
vs.
treatment_type:Drug AND (has_indication:”abdominal pain” OR has_side_effect:”abdominal pain”)
Verb/Preposition context rulestreat,for,indicated => has_indicationcause,produce => has_side_effect
Query AutofilteringBeatles Songs covered vs Songs Beatles covered
covers by other artists of songs written by the Beatles vs covers by Beatles of songs by other songwriters
Robert Johnson Songs that Eric Clapton covered
works the same as:
Eric Clapton covers of Robert Johnson Songs
Insomnia Drugs - are just indicated drugs
Noun-Noun Phrases
Robert Johnson Songs
Beatles Songs
Robert Johnson Songs
Insomnia Drugs
covered,covers:performer_ss | version_s:Cover |original_performer_s:_ENTITY_,recording_type_ss:Song=>original_performer_s:_ENTITY_
Facets provide ContextVisualization and the search “conversation”: Discovery and Focus
• Post-query visualization- facet display - aggregated attributes of found things
• Pre-query visualization - query suggestion or typeahead - can use facets too (stay tuned).
• The Good, The Bad and The Ugly aspects of Facets
New and Improved: Statistics, Analytics and APIs - Oh My!
• Dashboards and Dynamic Business Intelligence
• Heatmap Faceting
• Pivot Facets and Ad-Hoc Object Hierarchies - now with stats!
•JSON Facet API
How can we use facets to improve typeahead?
Put more precision and more context into a suggester.
=> Using metadata - guide the user to more precise queries that we can be really GOOD at!
To do this, we can build a specialized suggester collection - then we can use facet contexts to build semantic and behavioral relationships within and between searches.
* Shameless Monty Python’s Flying Circus reference
And now for something completely different! *
Suggester BuildwareQuery Collectors or Fetchers
Gather sets of query suggestions - Interface with multiple implementations possible
Suggester Builder
• Validates suggestions
• Adds context to suggestions using faceting
• Submits suggestion and metadata to Solr Index
Query Logs
Terms Component
Curated Lists
Pivot Facet CollectorPivot Facet Collector
Databases - SQL or Not
Pivot Facet Query CollectorUses “Field Pattern Templates” to generate semantically rich suggestions
Structured data - metadata fields contain object attributes
Can combine these attributes into phrases - semantically (or not)
Machine doesn’t know semantics.
Example
Bob Jobs Accountant Cincinnati Ohio
makes sense
Ohio Accountant Jones Cincinnati Bob
doesn’t
first_name last_name occupation city state
Pivot Facet Query Collector
${musician_type} ${recording_type}s
${genre} ${musician_type}s
${performer} ${recording_type}s
Rolling Stones Albums
New Wave Songs
Classical Pianists
If we create Pivot Template Patterns like this:
${original_performer} ${recording_type}s covered by ${performer} (plus context)
Beatles Songs covered by Joe Cocker
We get suggestions like this:
${name}
Stuck Inside of Mobile With The Memphis Blues Again
Suggester Builder - validate and contextualize
• Validate - make sure that the query works
• Contextualize - use facets to acquire “aboutness” stuff
Tests the query against the content collection
“Stuck Inside of Mobile With The Memphis Blues Again”composition_type_ss: [
"Song"]composer_ss: ["Bob Dylan"
]genre_ss: ["Blues Rock""Folk Rock"]
Use Cases - User Context sensitive typeahead User Permissions: Security Trimming of Suggestions
Faceting on ACL lists of content collection - copy set of ACL values for suggestion result set to suggester collection
=> Don’t suggest queries that return 0 results for a given user
User Behavior: Dynamic On-The-Fly Predictive Analytics
Cache context facets returned by Suggester - use as boost queries for subsequent queries in a user session
=> System learns “what” user is looking for
Data Quality - Text - MetadataData design and curation - solve garbage in - garbage out at the source.
More fields with more precise values - combine for expressiveness
The Ole Structured vs Unstructured bugga-boo
Use Machine Learning / Knowledge Base Classification to add metadata
“MODEL”(
Machine(Learning(
Subject(Ma6er(Experts(
Model Building
Training'Set'–'“Seed'Crystal”'Subject'Ma8er'Experts'
Machine'Learning'
Model'
QUERY& DOCUMENTS&
Yes$
No$
Feature&Sets&
Model: Mapping of Text => Feature Sets
Detecting and Consuming Context
(more)'Structured'Document'Collec1on'
Query'Autofiltering'
Query'
Solr'/'Lucene''
Result'Set'
Query Autofiltering can be used as a “normalization” layer for classification
Document)Classifica0on)Stages)(Manual,ML,Ontology,Hybrid))
Document)Classifica0on)Stages)(Manual,ML,Ontology,Hybrid))
Document)Classifica0on)Stages)(Manual,ML,Ontology,Hybrid))
Metadata)Enrichment)
(more))Structured)Document)Collec0on)–)The)Model!)
=> Can Think of the Solr/Lucene Index itself as the “Model”