Search relevancy

11
Relevancy in Commerce Search Klaus Herrmann – hybris software

description

 

Transcript of Search relevancy

Page 1: Search relevancy

Relevancy in Commerce Search

Klaus Herrmann – hybris software

Page 2: Search relevancy

-  Free text search is the most popular way to find products in online shops!

-  Irrelevant results quickly lead to user frustration -  Who really goes to pages 2 and beyond if the first page shows garbage?

-  Search in eCommerce is very often powered by document search engines not optimized to find products

-  Apache Solr and Elastic Search are the most popular (open source) choices

-  Commercial alternatives are abundant, but often expensive, heavyweight, hard to integrate, ... -  Endeca, Fact Finder, Fredhopper, Compario, Pertimm, ... -  ... And not necessarily better

-  Most of us will have to make relevancy work

Using Full Text Search Engines to Find Products

Page 3: Search relevancy

Full Text Search Relevancy: TF-IDF

Page 4: Search relevancy

Full Text Search Relevancy: TF-IDF

Is “sale” in “Sale! T-shirts! Sale!” really twice as relevant than in just “T-shirt Sale!” ?

Page 5: Search relevancy

Full Text Search Relevancy: TF-IDF

Counting in all documents includes all categories and catalogs, regardless of current context and filters! Are rare terms necessarily more meaningful?

Page 6: Search relevancy

-  TF-IDF can do more harm than good when searching (semi-)structured data -  Potentially bad score discounting for frequent terms -  Confusing boosts to rare but meaningless keywords -  E.g. Matching the colour (red / green / ...) of a product should have a comparable effect on

scoring regardless of the number of products you sell in that colour

-  High-tech algorithmic fixes exist -  Research on better versions of TF-IDF -  Model “information gain” of keywords in contexts: e.g. as done by Etsy.com

-  Lower tech fixes exist, too! -  TF=1, IDF=1 for product titles, brands, colours – structured data -  Use boosts and functions e.g. to push newer, cheaper, high margin, ... Products

“q=keyword OR (inStock:true^100)” “boost=recip(ms(NOW/HOUR,pubdate),3.16e-11,0.08,0.05)”

-  Field weights matter a great deal: SKU > Brand > Colour > Title > Description -  Clean, well structured product data tops all.

Beyond TF-IDF

Page 7: Search relevancy

-  Other factors are at least as important

-  Performance: “2 second rule” -  Especially when paging

-  Meaningful facet filters to help navigate results

Relevancy is complex. Do we really have to?

Page 8: Search relevancy

-  Other factors are at least as important

-  Performance: “2 second rule” -  Especially when paging

-  Meaningful facet filters to help navigate results

-  Visual presentation makes browsing more fun -  Especially in fashion

Relevancy is complex. Do we really have to?

Page 9: Search relevancy

-  Other factors are at least as important

-  Performance: “2 second rule” -  Especially when paging

-  Meaningful facet filters to help navigate results

-  Visual presentation makes browsing more fun -  Especially in fashion

-  Help your users phrase good queries -  Autocomplete keywords -  Category and brand suggestions -  Spell checking -  Search-as-you-Type results

Relevancy is complex. Do we really have to?

Page 10: Search relevancy

-  Scoring for human beings (Elastic Search): https://speakerdeck.com/elasticsearch/scoring-for-human-beings http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

-  Beyond TF-IDF (Etsy.com): http://www.slideshare.net/lucenerevolution/beyond-tf-idf-why-what-how http://www.youtube.com/watch?v=C25txE_dq90

Further Reading

Page 11: Search relevancy