Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam,...

37
André de Oliveira | @arbocombr Intelligent Information Discovery Machine Driven Search André de Oliveira Search Engineering Lead, Liferay Inc.

Transcript of Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam,...

Page 1: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Intelligent Information DiscoveryMachine Driven Search

André de OliveiraSearch Engineering Lead, Liferay Inc.

Page 2: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

The explosion of content

Page 3: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Page 4: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

What is yourinformation discovery strategy?

Page 5: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ 1st Generation: BrowseWhat is your discovery strategy?

Page 6: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Customer "Journey" - with a handrail...▪ Breadcrumbs...▪ Product catalog with category tree...▪ Early "search" with SQL queries...

- Click oriented information discovery -(Still the strategy of many legacy applications)

1st Generation: Browsing for content

Discovery gateway:"The Navigation Menu"

Page 7: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ 1st Generation: Browse▪ 2nd Generation: Search

What is your discovery strategy?

Page 8: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Search engines; full text search; facets▪ Customer Journey: freedom from "navigation"▪ User → Application: trial and error until found▪ Results "in the now" - missed new content

- Keyword oriented information discovery -(Strategy of choice for modern applications)

2nd Generation: Searching for content

Discovery gateway:"The Search Bar"

Page 9: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ 1st Generation: Browse▪ 2nd Generation: Search▪ Next Generation: Predict

What is your discovery strategy?

Page 10: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Every user action is an indication of interest▪ Searches + Browsing paths + Purchase history...

▪ New content, matching interest? Show it now▪ Application → User: find for you - "search-less"▪ Customer Journey continually improves itself

- Interest oriented information discovery -Dominant discovery strategy of the future.

Next Generation: Predicting relevant content

Discovery gateway:"Everywhere"

Page 11: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Intelligentinformation discovery

Page 12: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Beyond the database▪ More than filters: scores▪ Information retrieval▪ Full text queries▪ Ranking and relevance algorithms

It all starts with Search...

How to use a search engine to predict relevant content?

Page 13: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Input = KeywordsOutput = Scored predictions

Predicting with a search engine

Using... Calculate score of...

Autocomplete User input against document titles

Did you mean…? User input against spellcheck dictionary

Suggest as you type User input against popular queries

More like this Whole result documents against other documents

Percolators Whole new documents against predefined queries

Page 14: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Data Science and Machine Learning▪ Neural networks can be trained to make predictions

▪ A scored guess that best matches prior known results▪ Universal, reusable mathematical algorithms

▪ Regression, Classification, Clustering...▪ A trained neural network is like an API

▪ As long as you can feed it numerical input

Input = NumbersOutput = Scored predictions

Rise of the machines

Page 15: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Modeling your input universe as meaningful numbers

The essential challenge in Machine Learning

Picture?

Voice?

Search query?

User indication of

interest?

Scored prediction

Scored prediction

Page 16: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Pictures ⇒ Pixels ⇒ Numbers

Image classification

Page 17: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Sound waves ⇒ Frequencies and amplitudes ⇒ NumbersSpeech recognition

Page 18: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Facial landmarks ⇒ Measurements ⇒ NumbersFace detection

Page 19: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Universe ⇒ Numerical model ⇒ Algorithms ⇒ Scored predictions

Predicting with Deep Learning: Confidence

Picture?

Voice?

Search query?

User indication of

interest?

Scored prediction

Scored prediction

#LRDEVCON 2017 Highlight

"Machine Learning and DXP - The best of 2 worlds"Carlos Hernandez & Filipe Afonso,

Senior Consultants, Liferay

Page 20: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ A similarity is a scoring / ranking model▪ Leverage information retrieval algorithms

▪ BM25, ▪ divergence from randomness, ▪ divergence from independence, ▪ information based...

▪ Can be mapped per field - fine tuning▪ Some models suit shorter fields better (BM25)

▪ Elasticsearch Similarity Module

Predicting with Search: Similarity

"value" : 2.7051764, "description" : "score(doc=0,freq=1.0), product of:", "details" : [ { "value" : 0.66422296, "description" : "queryWeight, product of:", "details" : [ { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)", "details" : [ ] }, { "value" : 0.16309182, "description" : "queryNorm", "details" : [ ] } ] }, { "value" : 4.0726933, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0", "details" : [ ] } ] }, { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)", "details" : [ ] }, { "value" : 1.0, "description" : "fieldNorm(doc=0)", "details" : [ ]

Elasticsearch Explain API

Page 21: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Numerical representation of a search▪ TF–IDF = Term Frequency–Inverse Document Frequency▪ Tokenize the text▪ Create appropriate term vectors▪ Prepare a TF–IDF matrix

Textual analysis

Term Term Countthis 1is 1a 2sample 1

Document 1

Term Term Countthis 1is 1another 2example 3

Document 2

Page 22: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

TF–IDF (Term Frequency–Inverse Document Frequency)

In question form... Score increases...

Term Frequency How often does a term appear in a field? + When the term pops up a lot of times along the text

Inverse Document Frequency

How rare is the term in the whole index? + When the term is found in this document and not many others

Field-length Norm How short is the field where the term is? + When there isn't much else in the same field (like, a title)

Page 23: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Sanitize out "stopwords"▪ Irrelevant words and phrases

▪ Spell check aggressively▪ "Search for … instead"

▪ Predict non text fields as well▪ Serial numbers (and how to analyze them)

▪ More than one way to say the same thing▪ Synonyms; alternate spellings; separate v. combined words

▪ Contextualize (every application is different)▪ The "adwords" effect - search for this, always show that

Improving textual analysis for better predictions

Page 24: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Machine driven search

Page 25: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Predict relatable searches▪ Given a user initiated search,▪ in a universe of searches from all customers,▪ classify by similar interest group,▪ suggest and push predicted relatable results

▪ A user search is an indication of interest in itself▪ Algorithms / recommender systems▪ Smart content delivery

Machine Driven Search

Page 26: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Autocomplete: User input → Document titles▪ Suggest as you type: User input → Popular queries▪ Base case is quickly exhausted; be creative▪ Primary field (e.g. title)

▪ Combine multiple queries for score - match, prefix, phrase▪ Multiple fields

▪ More matches, and may be what they're really looking for▪ Multiple target entities

▪ Why limit to one kind of content?▪ Organize and render drop list accordingly

Predict with Search: Instant results in search bar

Page 27: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Find content similar to a previous existing document▪ and/or additional user input ▪ and/or arbitrary content

▪ MLT on specific target fields▪ “title”, “description”, “content”

▪ TF–IDF is key▪ Input is analyzed same as target fields▪ TF–IDF is calculated for all terms▪ Top terms with highest TF–IDF are selected▪ Combined "OR" query with top terms only

Predict with Search: More Like This

Page 28: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ “Classify” content from users, given any number of rules▪ Rules are registered as “percolator queries”

▪ Submit incoming documents to rule set▪ “Would this doc match this query?”

▪ Response indicates rules that matched▪ Content can then be “classified” accordingly

Predict with Search: Percolate

#LRDEVCON 2017 Highlight

"Going in reverse to move forward: How reverse querying gives you fully automated publishing"

Jan Verweij, Sales Engineer, Liferay

Page 29: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Store “successful” queries for users▪ Definition of “successful” according to your Customer Journey▪ Indication of interest

▪ Cluster directly visited content with successful result hits▪ To further refine content relevancy

▪ Cluster successful queries from users with a similar journey▪ “Users belonging to customer profile X also search for Y”▪ Data Science and Single Customer View

Predict with Search: Recommended for YOU

#LRDEVCON 2017 Highlight

"Single Customer View Demystified"Jonathan Lee, Product Manager, Liferay

Page 30: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Discovery gateway: everywhere▪ Front page (Since your last visit)▪ Search bar (As you type)▪ Search results (Also looked for)▪ Content view (You may also like)▪ Push notifications (Never miss another)

▪ Match new content ▪ previous "successful" actions from user

▪ Anticipate and influence▪ Ever-improving Intelligent Information Discovery

Smart Content Delivery

Page 31: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

What’s next for Liferay Search

Page 32: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Elasticsearch 6

Page 33: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Rethinking Indexer architecture for extensibility▪ Composition over inheritance▪ Small, single-purpose components▪ New extension points▪ Reusable▪ Easier to test

Modular Search infrastructure

Page 34: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Building custom Search experiences▪ Reusable and flexible▪ Search Page Templates▪ Search Bar▪ Search Results▪ Multi Selection Facets▪ Insights▪ Map▪ … and your own Search-aware Portlets

New Search Components

Page 35: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

▪ Powered by Search, end to end▪ Improved User Experience▪ Data driven - and faster than the database▪ Ready for Intelligent Information Discovery

Liferay Commerce

#LRDEVCON 2017 Highlight

"Liferay Commerce: A Preview of Our Upcoming Features"Marco Leo, Software Architect, Liferay

Page 36: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Intelligent Information Discovery

LiferayDXP

Search engine (Elasticsearch 6)

AIaaS(APIs)

- Autocomplete- Did you mean…?

- Suggest as you type- More like this- Percolators- Similarity

- Image classification- Speech recognition

- Face detection

Data Drivenapplication infrastructure

(WeDeploy)

- Search classification- Customer interest prediction

- Machine learning trained models- Recommendation engines

Page 37: Intelligent Information Discovery: Machine Driven Search - Liferay DEVCON 2017, Amsterdam, Netherlands

André de Oliveira | @arbocombr

Thank youand many discoveries at #LRDEVCON